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The  principal  focus  of  this  work  is  the  application  of  fast  techniques  for  Toeplitz 
eigendecomposition  to  improve  the  computational  efficiency  of  the  subspace  algo- 
rithms for  frequency  estimation  of  sinusoidal  signals.  Two  forms  of  fast  Toeplitz 
eigendecomposition  algorithms  are  developed:  the  first  is  an  improved  version  of  an 
earlier  fast  Toeplitz  algorithm  that  computes  any  eigenvalue  and  the  associated  eigen- 
vector of  a real  symmetric  MxM  Toeplitz  matrix  in  0(M2)  floating-point  operations; 
the  second  employs  the  recently  developed  “superfast”  techniques  for  solving  Toeplitz 
systems  to  compute  any  eigenvalue  and  the  associated  eigenvector  in  0[M  log2  M) 
operations,  a lower  asymptotic  complexity  than  any  previous  method. 

A principal  concern  in  the  use  of  these  fast  and  superfast  algorithms  is  their 
numerical  instability,  which  may  lead  to  inaccurate  results  in  rare  cases.  In  this 
work,  reliable,  efficient  tests  are  described  for  assuring  the  accuracy  of  the  computed 
eigenvalues  and  eigenvectors.  These  tests  are  based  on  residual  bounds,  and  employ 
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fast  algorithms  for  Toeplitz  matrix-vector  multiplication,  making  their  use  practical 
even  with  the  superfast  eigendecomposition  algorithms. 

To  provide  a benchmark  for  evaluating  frequency  estimates  computed  by  the 
new  techniques,  a new  bound  for  the  estimation  of  the  frequencies  of  sinusoidal  signals 
is  derived.  The  Bhattacharyya  bound,  an  extension  of  the  well-known  Cramer-Rao 
bound,  is  a lower  bound  to  the  variance  of  any  unbiased  estimator  of  the  frequencies 
of  sinusoidal  signals.  This  bound  is  the  tightest  known  bound  for  this  problem,  and 
is  significantly  tighter  than  the  Cramer-Rao  bound  in  cases  where  the  signal-to-noise 
ratio  is  low,  the  number  of  samples  is  small,  or  the  sinusoids  are  closely  spaced  in 
frequency. 

Finally,  the  new  techniques  for  Toeplitz  eigendecomposition  are  used  to  pro- 
duce a more  efficient  variant  of  the  ESPRIT  algorithm  for  frequency  estimation.  The 
performance  of  this  fast  variant  and  the  original  ESPRIT  algorithm  are  compared  for 
a wide  variety  of  situations,  and  it  is  shown  that  in  many  cases,  the  fast  algorithms 
can  produce  a more  accurate  estimate  with  less  computation,  by  employing  a larger 
Toeplitz  estimate  of  the  autocorrelation  matrix,  rather  than  the  covariance  estimate 
used  in  the  conventional  implementations.  In  cases  where  long  data  records  are  avail- 
able, this  fast  Toeplitz  ESPRIT  can  produce  more  accurate  frequency  estimates  with 
less  computation  than  the  standard  ESPRIT  algorithm. 
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CHAPTER  1 
INTRODUCTION 


In  many  signal  processing  applications,  it  is  necessary  to  determine  the  fre- 
quencies of  sinusoidal  signals  from  a data  record  that  is  corrupted  by  additive  noise. 
Estimating  the  parameters  of  sinusoidal  signals  in  noise  is  one  of  the  oldest  problems 
in  the  field  of  signal  processing,  dating  back  at  least  as  far  as  the  18th  century  work  of 
Kelvin  and  Schuster  on  analysis  and  prediction  of  tides  and  temperatures.  Problems 
of  this  type  occur  not  only  in  analysis  of  time  series  data,  but  also  in  array  bearing 
estimation,  that  is,  the  estimation  of  the  directions  of  arrival  of  signals  from  data 
gathered  by  a sensor  array. 

The  first  techniques  to  be  applied  to  this  problem  were  the  classical  spectral 
estimators,  such  as  the  periodogram,  which  are  based  on  the  Fourier  transform.  These 
estimators  provide  good  performance  for  both  high  and  low  signal-to-noise  ratios, 
and  are  simple  to  compute.  If  an  M x M autocorrelation  matrix  is  used,  for  large 
M the  classical  techniques  require  a number  of  operations  proportional  to  M log  M; 
generally,  the  notation  0(M  log  M)  will  be  used  to  indicate  the  asymptotic  complexity 
of  algorithms  as  the  problem  size  becomes  large.  The  principal  shortcoming  of  the 
classical  estimators  is  their  low  resolution,  that  is,  their  inability  to  separate  signals 
with  small  frequency  differences.  Later,  techniques  based  on  modeling  the  signal 
as  an  autoregressive  process  were  developed.  These  approaches  have  much  higher 
resolution  than  classical  techniques,  and  at  0(M2),  require  only  a modest  amount 
of  computation.  However,  their  performance  deteriorates  rapidly  as  the  signal-to- 
noise  ratio  decreases,  and  they  are  actually  inferior  to  classical  techniques  at  low 
signal-to-noise  ratios. 
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In  the  past  20  years,  a class  of  estimators  based  on  concepts  drawn  from  linear 
algebra  has  been  developed.  These  techniques,  commonly  referred  to  as  subspace 
methods,  include  the  Pisarenko  harmonic  decomposition,  MUSIC,  the  principal  com- 
ponents method,  and  ESPRIT.  They  combine  high  resolution  with  good  performance 
at  low  signal-to-noise  ratios,  and  are  among  the  most  accurate  methods  known  for 
frequency  estimation.  The  principal  disadvantage  of  the  subspace  methods  is  their 
computational  burden:  all  of  the  widely  used  subspace  techniques  perform  an  eigen- 
decomposition  of  an  estimate  of  the  autocorrelation  matrix,  which  requires  0(M3) 
operations  when  computed  by  conventional  methods.  If  M is  large,  the  difference 
between  0(M 3)  and  the  0(M\ogM ) of  the  classical  methods,  or  the  0(M 2)  of  au- 
toregressive modeling,  can  be  prohibitive. 

Since  the  computational  requirements  of  all  of  the  techniques  used  for  esti- 
mating frequencies  increase  with  M,  it  is  tempting  to  reduce  the  length  of  the  au- 
tocorrelation estimate  in  order  to  reduce  the  computational  load.  Unfortunately,  a 
basic  result  in  the  estimation  theory  of  sinusoidal  signals  is  that  the  best  achievable 
variance  for  an  unbiased  frequency  estimate  improves  as  the  cube  of  the  number  of 
samples.  In  practice,  this  means  that  reducing  M degrades  the  accuracy  of  the  esti- 
mate severely.  If  accurate  frequency  estimates  are  required,  M must  be  as  large  as  is 
practical. 

Frequency  estimation  problems  therefore  present  a difficult  choice  between  the 
subspace  methods,  which  offer  high  resolution  and  good  performance  at  all  signal- 
to-noise  ratios,  but  require  a large,  possibly  prohibitive  amount  of  computation,  and 
autoregressive  or  classical  methods,  which  are  less  demanding  computationally  but 
have  important  drawbacks  such  as  poor  performance  at  low  signal-to-noise  ratios  or 
lower  resolution. 

Recently,  methods  have  been  developed  that  greatly  reduce  the  computation 
required  to  find  eigenvalues  and  eigenvectors  of  Toeplitz  matrices.  These  methods 
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employ  well-known  techniques  for  solving  Toeplitz  systems,  such  as  the  Levinson- 
Durbin  algorithm,  to  compute  an  eigenvalue  and  the  associated  eigenvector  in  0(M2) 
operations.  By  computing  a Toeplitz  estimate  of  the  autocorrelation  matrix,  and 
employing  these  methods  to  compute  its  eigendecomposition,  the  efficiency  of  the 
subspace  techniques  can  be  dramatically  improved.  The  principal  goal  of  this  work 
is  to  develop  these  techniques  and  to  characterize  their  performance,  in  terms  of 
both  speed  and  accuracy,  in  comparison  to  standard  implementations  of  the  subspace 
techniques. 

To  provide  a means  for  evaluating  the  performance  of  both  the  standard  and 
the  new  techniques,  an  analysis  of  statistical  bounds  on  the  estimation  of  frequencies 
of  sinusoids  has  been  performed.  In  addition  to  a discussion  of  the  well-known  Cramer- 
Rao  bound,  an  extended  version  of  this  bound,  called  the  Bhattacharyya  bound,  is 
derived  for  the  first  time.  This  result  provides  a tighter  bound  on  the  variance  of 
estimators  of  sinusoidal  frequencies,  and  is  much  tighter  than  the  Cramer-Rao  bound 
in  cases  of  low  signal-to-noise  ratios,  short  data  records,  or  signals  whose  frequencies 
are  very  closely  spaced.  The  new  bound  substantially  narrows  the  gap  between  the 
performance  of  existing  estimators  and  the  tightest  known  bounds,  and  may  be  a 
first  step  toward  an  understanding  of  the  limits  on  the  performance  of  frequency 
estimators. 

Previous  fast  Toeplitz  eigensolvers  were  0(M2),  and  questions  existed  concern- 
ing their  numerical  stability.  In  the  present  work,  the  efficiency  of  earlier  fast  Toeplitz 
eigensolvers  has  been  further  improved  by  employing  recently  developed  “superfast” 
algorithms  for  solving  Toeplitz  systems,  which  reduce  the  complexity  of  the  eigen- 
decomposition. The  resulting  algorithm  computes  an  eigenvalue  and  the  associated 
eigenvector  of  a Toeplitz  matrix  in  0(M  log2  M)  operations;  this  is  the  lowest  asymp- 
totic complexity  of  any  known  algorithm  for  the  Toeplitz  eigenproblem.  To  address 
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the  issue  of  numerical  stability,  a set  of  simple,  efficient  tests  for  verifying  the  accu- 
racy of  an  eigendecomposition  of  a Toeplitz  matrix  are  presented.  These  tests  are 
reliable,  useful  for  both  the  earlier  fast  techniques  and  the  new  superfast  algorithms, 
and  have  negligible  computational  cost. 

Finally,  the  fast  Toeplitz  eigendecomposition  techniques  have  been  used  to 
develop  a fast  variant  of  the  ESPRIT  algorithm  for  frequency  estimation.  The  per- 
formance of  the  fast  method  has  been  compared  to  a conventional  implementation, 
and  it  has  been  shown  that  the  fast  method  can  often  produce  more  accurate  fre- 
quency estimates  with  less  computation.  Although  the  use  of  a Toeplitz  estimate 
degrades  the  accuracy  of  the  estimate  somewhat,  the  fast  Toeplitz  algorithms  make 
it  possible  to  use  much  larger  autocorrelation  matrices.  In  cases  where  long  data 
records  are  available,  or  when  the  signal-to-noise  ratio  is  low,  the  increased  accuracy 
due  to  the  use  of  a larger  matrix  more  than  compensates  for  the  loss  due  to  the  use 
of  a Toeplitz  estimate. 


CHAPTER  2 

SIGNAL  MODELS  AND  STATISTICAL  LIMITS  ON  ESTIMATION 

The  fundamental  goal  of  any  estimation  technique  is  to  produce  an  accurate 
estimate  x of  an  unknown  parameter  x\  in  this  chapter,  we  will  consider  limits  on 
the  accuracy  of  estimation  of  sinusoidal  frequencies.  The  accuracy  of  an  estimator  is 
typically  defined  in  terms  of  its  bias  and  variance:  an  ideal  estimation  technique  would 
have  zero  bias  and  the  smallest  possible  variance.  It  is  not  immediately  apparent  that 
there  is  any  lower  bound  to  the  variance,  however,  the  performance  of  any  estimator 
of  a random  quantity  is  limited  by  statistical  bounds. 

Statistical  bounds  are  derived  from  a model  of  the  random  process  that  is 
assumed  to  have  produced  the  received  signal,  and  apply  only  when  this  assumption 
is  valid.  Many  techniques  exist  for  bounding  the  performance  of  estimators;  in  this 
chapter,  the  Cramer-Rao  and  Bhattacharyya  bounds  for  estimation  of  the  frequencies 
of  real  sinusoids  in  additive  white  Gaussian  noise  are  derived.  Both  these  bounds  set 
lower  limits  to  the  variance  of  any  unbiased  estimate  of  a parameter  of  the  random 
process,  providing  a useful  benchmark  for  evaluating  the  performance  of  estimation 
techniques;  although  the  Cramer-Rao  bounds  have  been  derived  previously  [1],  the 
Bhattacharyya  bounds  are  a new  result.  To  calculate  either  of  these  bounds,  it  is  first 
necessary  to  define  the  random  process  that  is  assumed  to  be  the  source  of  the  signal. 

Modeling  the  Signal 

The  frequency  estimation  problem  may  be  simply  stated:  given  a set  of  ob- 
served data  y[k],  containing  L points,  and  assuming  that  the  data  consist  of  sinusoids 
in  additive  noise,  estimate  the  frequencies  of  the  sinusoids.  The  problem  may  also  be 
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formulated  in  the  following  equivalent  manner:  given  a data  record,  find  the  frequen- 
cies of  a sinusoidal  model  that  provides  the  best  fit  (in  some  appropriate  measure) 
to  the  data.  We  will  briefly  review  the  two  most  widely  used  models  for  the  signals 
encountered  in  frequency  estimation:  the  deterministic  and  stochastic  signal  models. 

The  Deterministic  Model 

A signal  composed  of  sinusoids  in  stationary  noise  may  be  modeled  as  the 
output  of  a random  process  that  is  parameterized  by  the  amplitudes,  frequencies, 
and  phases  of  the  sinusoids,  and  by  the  variance  and  autocorrelation  of  the  noise. 
The  underlying  signal  whose  parameters  are  to  be  estimated  is  modeled  as  a real- 
valued signal  #[&]  consisting  of  N sinusoids: 

N 

x[k\  = £ a,iCos(ujik  + fa).  (2.1) 

i=l 

It  will  be  assumed  that  there  are  exactly  N distinct  sinusoids  (that  is,  a*  > 0 and 
uji  ^ ujj  if  i ^ j),  that  the  frequencies  are  below  the  Nyquist  limit  (0  < Ui  < it),  and 
that  the  phases  are  unambiguously  specified  (— n < < 7r),  for  all  i e {1 . . . N}. 

The  probabilistic  model  for  the  observed  data  y[k]  is  the  noisy  signal  model  y[ k\: 

N 

y[k]  — ^2aiCOs(uJik  + (l)i)-\-n[k},  (2.2) 

i= 1 

where  n[k]  is  the  noise.  For  convenience  in  the  derivation,  the  observed  data  y[k], 
k e {&0  + 1, . . . , k0  + L},  where  k0  is  the  offset  of  the  start  of  the  data  record,  may 
be  grouped  into  a vector  y;  similarly,  the  model  signals  x[k]  and  y[k]  may  be  referred 
to  as  x and  y.  Although  the  noisy  signal  model  is  a random  process,  the  underlying 
noise-free  signal  x is  deterministic,  and  it  is  from  this  fact  that  the  model  takes 
it  name.  The  deterministic  signal  model  is  a nonstationary,  non-Gaussian  random 
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Given  the  model  y,  and  the  set  of  observed  data  y,  the  problem  is  to  determine 
the  parameters  of  y so  that  if  the  data  are  the  output  of  a process  like  that  assumed  in 
the  model,  the  parameters  of  y are  good  estimates  of  the  parameters  of  the  observed 
data.  We  will  assume  that  it  is  known  a priori  that  the  use  of  the  sinusoidal  model  is 
appropriate. 

In  the  following  derivations,  we  will  take  n[k\  to  be  a zero-mean  white  Gaussian 
noise  process  with  variance  a 2 . This  form  of  the  deterministic  model  has  been  essen- 
tially the  only  one  treated,  because  of  the  complexity  of  the  derivation  for  colored  or 
non-Gaussian  noise.  Bounds  for  the  case  of  colored  noise  may  be  derived  by  following 
the  same  steps  with  the  appropriate  joint  probability  density  function  for  the  noise 
samples  n[k],  although  the  algebraic  complications  are  formidable.  For  non-Gaussian 
noise  whose  probability  density  function  p(w)  is  exponential,  the  likelihood  function 
is  also  exponential,  and  a procedure  similar  to  that  given  below  may  be  followed  to 
yield  the  desired  bounds.  Non-Gaussian  noise  with  a non-exponential  density  func- 
tion introduces  additional  complications  to  the  derivation,  since  expressions  of  the 
form  d qqT1  / P{®),  where  P(0)  is  the  likelihood  function  defined  below,  arise  in  the 
derivation  of  bounds.  If  p{w)  is  not  exponential,  these  expressions  no  longer  have  a 
simple  form. 

For  the  white  noise  case,  the  parameters  of  the  model  y[k ] are  the  amplitudes 
a*,  the  frequencies  cuj,  and  the  phases  </>*  of  the  sinusoids,  and  the  variance  a2  of  the 
noise.  The  complete  set  of  parameters  may  be  arranged  into  a vector  0: 


4> !>•••)  un,  Un, 


(2.3) 
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If  the  parameters  0 of  the  model  and  the  observed  data  y are  given,  the 
probability  density  of  y is  given  by 


If  the  signal  vector  y is  fixed,  and  the  parameters  © of  the  model  are  considered  as 
variables,  then  the  expression  above  is  referred  to  as  the  likelihood  function  of  the 
parameters  0.  To  indicate  whether  an  expression  is  employed  as  a probability  density 
function  or  a likelihood,  the  notation  p(y|0,  y)  will  be  used  for  density  functions,  and 
P(0)  for  the  corresponding  likelihood  functions. 

Maximum  Likelihood  Estimation 

One  method  for  estimating  the  parameters  of  a signal  is  simply  to  maximize 
the  likelihood  function  P(0)  for  the  observed  data;  this  is  maximum  likelihood  esti- 
mation. Maximum  likelihood  estimators  have  several  attractive  properties,  including 
consistency  (as  the  number  of  samples  approaches  infinity,  the  estimate  approaches 
the  true  value)  and  asymptotic  efficiency  (as  the  number  of  samples  approaches  in- 
finity, the  variance  of  the  estimate  approaches  the  minimum  possible  variance  for  an 
unbiased  estimator,  which  is  the  Cramer- Rao  bound).  These  two  attractive  proper- 
ties, however,  hold  only  for  infinite  data  records.  One  property  that  does  hold  for 
finite  records  is  that  if  any  estimator  is  efficient  (i.e.,  achieves  the  Cramer- Rao  bound) 
for  a finite  number  of  samples,  then  the  maximum  likelihood  approach  will  find  an 
efficient  estimator. 

In  practice,  we  are  most  concerned  with  the  performance  of  an  estimator  on 
finite  data  records;  for  example,  the  subspace  techniques’  principal  advantage  over 
the  much  simpler  classical  approaches  is  that  subspace  techniques  can  resolve  signals 
closely  spaced  in  frequency  using  fewer  samples  of  the  data.  The  attractive  properties 


ko~\~L 


(2.4) 
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of  the  maximum  likelihood  estimator  are  for  the  most  part  asymptotic  properties,  and 
since  it  has  been  shown  that  the  maximum  likelihood  estimator  is  not  efficient  for 
finite  sample  lengths  when  noise  is  present  [2],  the  maximum  likelihood  estimator  is 
not  necessarily  superior  to  any  other  for  this  problem. 

One  major  barrier  to  the  use  of  the  maximum  likelihood  estimator  for  frequency 
estimation  is  that  the  maximum  of  P(0)  is  very  difficult  to  find.  The  likelihood 
function  is  a very  complex  function,  with  many  local  maxima  [3],  and  to  be  certain 
of  locating  the  global  maximum,  it  is  necessary  to  have  a very  good  estimate  of  the 
frequencies  as  a starting  point  for  numerical  maximization.  In  fact,  subspace  methods 
are  often  used  to  generate  the  starting  points  for  maximum  likelihood  estimators. 

The  Stochastic  Model 

As  we  will  see  later  in  this  chapter,  the  deterministic  signal  model  is  relatively 
detailed,  and  calculations  based  on  this  model  often  become  impractically  complex. 
In  those  cases,  a simpler  model,  referred  to  as  the  stochastic  signal  model,  may 
be  used  in  place  of  the  deterministic  model  to  make  the  analysis  more  tractable. 
The  stochastic  model  assumes  the  signal  is  a stationary  zero-mean  Gaussian  random 
process,  parameterized  by  its  (Toeplitz)  autocorrelation  matrix  S;  the  likelihood 
function  of  this  model  is  given  by 

P-(y|S'y)  = (2.det1(E))Wexp(“y,'S"ly/2)' 

This  appears  to  be  a major  departure  from  the  deterministic  signal  model; 
however,  the  two  models  are  in  fact  related  in  a simple  way  by  the  assumptions  made 
about  the  coefficients  of  the  sinusoidal  model.  In  the  deterministic  model,  all  the 
parameters  of  the  signal  were  assumed  to  be  (unknown)  constants.  If  the  amplitudes 
in  a sinusoidal  signal  model  are  taken  instead  to  be  Rayleigh  distributed  random 
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variables,  and  the  phases  to  be  uniformly  randomly  distributed  between  —7 r and  n, 
independent  of  the  amplitudes,  then  the  output  of  the  sinusoidal  model  is  zero-mean, 
stationary,  and  Gaussian  [4,  p.  48],  exactly  as  assumed  in  the  stochastic  signal  model. 


Once  a model  for  the  signal  has  been  chosen,  it  is  possible  to  bound  the 
accuracy  of  estimators  of  the  parameters  of  the  model.  There  are  many  techniques 
for  determining  statistical  bounds;  in  this  section  we  will  derive  bounds  using  two 
closely  related  methods:  the  Cramer- Rao  approach  [5],  and  a generalization  of  this 
approach  due  to  Bhattacharyya  [6]. 

The  Cramer-Rao  Bound 

The  simplest  and  most  widely  used  of  the  statistical  bounds  on  estimation  is 
the  Cramer-Rao  bound.  Using  the  notation  ( x ) to  denote  the  expected  value  of  x , we 
may  define  a K x K matrix  J,  derived  from  a set  of  K parameters  {©mi, . . . , 0TOk} 
of  the  model.  The  elements  of  J are  given  by 


where  any  set  of  K model  parameters  may  be  chosen  as  long  as  the  necessary  deriva- 
tives exist  and  are  linearly  independent.  The  matrix  J is  known  as  the  Fisher  infor- 
mation matrix.  If  a vector  of  partial  derivatives  v is  defined, 


Statistical  Bounds 


(2.5) 


1 ap(0)  1 [gp(o)  dP(e ) 8P{@ ) 


then  J can  also  be  written  in  a more  compact  form  as  J = (vvT).  Since  the  par- 
tial derivatives  in  v are  required  to  be  linearly  independent,  J is  nonsingular.  The 
inverse  of  the  Fisher  information  matrix  determines  a lower  limit  for  the  variance  of 
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any  unbiased  estimator;  this  limit,  the  Cramer-Rao  bound,  is  given  by  the  following 
theorem: 


Theorem  2.1  (Cramer-Rao  Bound)  7/0,  is  an  unbiased  estimator  of  a parame- 
ter @j  of  a random  process,  J = (vvT)  is  a nonsingular  nxn  matrix  constructed  from 
the  likelihood  function  P(0)  of  the  random  process  according  to  equation  (2.6),  and 
all  second  partial  derivatives  of  P(&)  with  respect  to  the  included  Q,  are  continuous, 
then  var(0j)  > J~l{i,  i)  for  all  i € {1 ...  n} . 

(Since  the  Cramer-Rao  bound  is  a special  case  of  the  Bhattacharyya  bound,  the  proof 
will  be  deferred  until  the  next  section.) 

The  derivation  of  the  Cramer-Rao  bound  does  not  require  that  all  the  param- 
eters of  P(0)  be  included  in  J.  It  is  frequently  the  case  that  only  a subset  of  the 
parameters  are  of  interest;  for  example,  in  frequency  estimation,  the  problem  is  to 
estimate  the  u The  remaining  parameters,  in  this  case  the  a*  and  fa,  and  a2,  need 
not  be  included  in  J to  determine  a bound  for  the  frequency  estimates.  However, 
including  all  the  parameters  of  the  model  yields  a tighter  bound,  as  shown  by  the 
following  theorem,  a slight  generalization  of  a result  due  to  Rife  and  Boorstyn  [7], 

Theorem  2.2  Let  a and  b be  real  random  vectors  of  lengths  n and  n+l,  respectively, 
with  n of  their  elements  identical,  and  let  II  be  a permutation  matrix  that  rearranges 
b so  that  the  first  n elements  of  lib  are  identical  to  a; 


a 


= nb. 


a 


If  A = (aaT),  B = (bbr),  both  A 1 and  B 1 exist,  and  C = IIB  1IIT,  then 
C(i,  i ) > A~l(i,  i)  for  all  i E {l ...  n} . 
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Proof:  Since 


iibeF 


T 

aa 

aa 

\_ 

A 

V 

T 

aa 

a2 

r 

ts 

> 

7 

using  the  partitioned  matrix  inversion  lemma  and  the  orthogonality  of  II,  we  find 


IIB1!!7 


A 1+/3A  1wTA  1 — (3 A xv 
-/3vTA_1  /? 


where  (3  = (7— vTA_1v)_1.  The  leading  nxn  submatrix  of  C is  A-1+/3A-1vvTA~1, 
and  for  C(i,i ) > for  z G {1 ...  n}  it  is  necessary  and  sufficient  that  the  di- 

agonal elements  of  /?D  = /3A_1vvTA_1  are  greater  than  or  equal  to  zero.  A is  sym- 
metric, since  (ajOj)  = its  inverse  is  therefore  symmetric,  and  A-1vvrA-1  = 

(A_1v)(vTA_T)  = ccT  = D.  Because  D is  an  outer  product,  it  is  positive  semidef- 
inite,  and  therefore  D(i,i)  > 0.  In  addition,  1//3  = 7 — vTA-1v  is  greater  than  or 
equal  to  zero,  since  ((a  — vTA~1a)2j  > 0 and 

^(a  — vrA-1a)2^  = ^a2  — 2vrA-1aa  + vTA_1aa7  A_1v^ 

= (a2)  — 2vTA_1(aa)  + vTA~:(aaT)  A-1v 
= 7 — vTA_1v. 


The  fact  that  IIB~1IIT  must  be  invertible  implies  that  ft  > 0.  Therefore,  because 
C(i,  i ) = A_1(i,  i)  +/3D(i,  i ),  and  both  ft  and  D(i,  i)  are  greater  than  or  equal  to  zero, 
C(i,  i ) > A-1(i,  i ) for  all  i G {1 . . . n}.  □ 

Since  the  Cramer-Rao  bound  is  derived  from  an  outer  product  of  the  form 
treated  above,  this  result  is  useful  in  a variety  of  situations;  we  will  see  in  the  following 
section  that  the  Bhattacharyya  bound  also  has  this  form.  Two  important  corollaries 
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are  given  here.  First,  application  of  Theorem  2.2  and  Theorem  2.1  indicates  how  the 
tightest  possible  Cramer-Rao  bound  may  be  obtained. 

Corollary  2.3  The  tightest  Cramer-Rao  bound  on  the  variance  of  an  unbiased  es- 
timator Qi  of  a parameter  0,  of  a random  process  is  obtained  when  the  derivatives  of 
P(0)  with  respect  to  all  parameters  of  the  random  process  are  included  in  v,  as  long 
as  the  resulting  J is  nonsingular. 

A similar  result  applies  when  there  is  more  than  one  signal  present,  since  additional 
signals  simply  mean  that  the  model  has  more  parameters. 

Corollary  2.4  The  Cramer-Rao  bound  on  the  variance  of  an  unbiased  estimator 
of  the  frequency  of  a single  sinusoid  is  also  a lower  bound  for  the  variance  of  any 
unbiased  estimator  of  the  frequency  of  that  sinusoid  when  other  sinusoids  are  present. 

The  proof  of  these  corollaries  is  by  induction  using  Theorem  2.2.  The  fact  that  a 
bound  for  one  or  two  signals  is  also  a bound  for  many  signals  is  extremely  important 
in  practice,  because  it  dramatically  reduces  the  amount  of  computation  required  to 
produce  a usable  bound;  rather  than  considering  all  the  signals  present,  only  those 
closest  in  frequency  to  the  signal  in  question  are  used  to  compute  the  bound.  When 
two  closely  spaced  sinusoids  are  well  separated  from  the  other  signals,  this  approach 
yields  a good  approximation  to  the  exact  bound. 

The  Bhattacharvva  Bound 

The  Cramer-Rao  bound  may  be  viewed  as  the  simplest  form  of  a more  general 
bound,  first  developed  by  Bhattacharyya  [6].  Define  a generalization  of  the  derivative 
vector  v that  includes  derivatives  of  higher  orders: 

<9P(0)  d2P(0)  <9mP(0) 

dQt  ” ,dQjldQh'  -'dekldek2...dGkj  •• 


1 

= p(@) 


(2.7) 
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As  before,  all  of  the  partial  derivatives  included  in  w must  exist  and  be  linearly  inde- 
pendent; to  simplify  the  notation,  we  will  also  assume  that  all  the  first  partial  deriva- 
tives of  P(0)  are  grouped  together  at  the  head  of  w,  as  indicated  in  equation  (2.7). 
When  P(&)  has  continuous  nth  derivatives  with  respect  to  the  parameters  in  0, 

dnP{@ ) dnP(&) 

dQil...dQin  ~ dQkl...dQkn' 

where  {ki . . . kn}  is  any  permutation  of  {?i . . . f„}.  This  means  that  only  one  deriva- 
tive of  a given  order  with  respect  to  a given  set  of  parameters  may  be  included  in  w. 
As  in  the  case  of  the  Cramer-Rao  bound,  the  tightest  bound  for  a given  maximum 
order  is  obtained  when  all  linearly  independent  derivatives  of  this  order  or  less  are 
included.  If  the  highest  order  included  in  w is  m,  then  the  matrix  Km  is  defined  as 

Km  = (wwT).  (2.8) 

By  analogy  with  the  Fisher  information  matrix,  Km  is  referred  to  as  the  mth  order 
Bhattacharyya  information  matrix;  the  first  order  Bhattacharyya  information  matrix 
is  just  the  Fisher  information  matrix. 

Theorem  2.5  (Bhattacharyya  Bound)  7/0*  is  an  unbiased  estimator  of  a pa- 
rameter 0,  of  a random  process,  K = ( wwr)  is  a nonsingular  nxn  matrix  constructed 
from  the  likelihood  function  P(&)  of  the  random  process  according  to  equation  (2.7), 
where  the  first  k elements  of  w contain  all  of  the  included  first  partial  derivatives, 
the  highest  order  partial  derivative  contained  in  w is  m,  and  all  partial  derivatives  of 
order  m + 1 o/  P(0)  with  respect  to  the  0*  are  continuous,  then  var(©j)  > K~x{i,  i) 
for  all  i € {1 ...  n}. 
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Proof:  Consider  the  matrix  given  by 


i / 

(0-0) 

1 

A=( 

• 

(0  — 0)T  WT 

\ 

w 

where  © and  © contain  the  same  parameters  in  the  same  order,  and  the  order  is 
chosen  so  that  it  matches  the  order  of  the  first  partial  derivatives  in  w,  so  that  if 

<9P(0)  <9P(0)  dP(@) 

00!  ’ 002  J ’ 

then  0 = [©x,  02, . . . , ©fc]r,  and  © = [0i,  02, . . . , @jt]T.  Since  0 is  an  unbiased 
estimator,  ((0*  — ©j))  = 0.  Taking  partial  derivatives, 


w 


P(0) 


dm 

deildei2...dQim 


= 0. 


If  P(0)  has  m 4-  1 continuous  partial  derivatives,  the  order  of  differentiation  and 
expectation  (that  is,  integration)  may  be  interchanged,  giving 


c7‘ 


dQ^dQi 2 . . . d 0 


g-((e,-e,))  = / 


sm({ei-eiMy|©,y)) 


dQhdQi2 . ..dOi7 
dmQj 

d&ild&i2 . . . 0©tm 


dy 


--I 

_ f dmp{ y|0,y) 

J deildQi2...dQimUiay 

= - J dijp(y\Q,y)dy 

r 1 <9mP(0) 

J P(G)dGildQi2...dGim 

= ~&ij  ~ (wj  ■ 0j) 


p(y|0,y)dy 


0j  P(0)  dy 


0, 
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where  we  have  used  the  index  j to  indicate  the  element  of  w associated  with  the  given 
partial  derivative.  This  means  that  —Sij  = ( WjQi ),  or 


Ce  -Z 

-Z  K 


where 


Z = 

Because  it  is  an  autocorrelation  matrix, 
can  be  written  as 


Ife  0 

A is  symmetric  and  positive  semidefinite;  it 


I* 

-ZK  1 

Cq  - ZK_1Z 

0 

I/t  o 

A = 

0 

In— A: 

0 

K 

1 

1 

2 

HH 

h 

N 

7 

a 

1 

1 

By  the  Sylvester  inertia  principle,  Cq  - ZK-1Z  is  also  positive  semidefinite,  and  so 
its  diagonal  elements  are  greater  than  or  equal  to  zero.  Since  the  diagonal  elements 
of  C@  are  the  variances  of  the  estimates,  and  ZK_1Z  is  the  k x k leading  principal 
submatrix  of  K-1,  var(0j)  — K~l(i , i ) >0.  □ 


From  Theorem  2.2,  we  can  see  that  the  tightest  Bhattacharyya  bound  of  a 
given  maximum  order  is  obtained  by  including  all  the  linearly  independent  partial 
derivatives  of  that  order  or  less.  Two  additional  important  corollaries  are: 


Corollary  2.6  The  Bhattacharyya  bound  on  the  variance  of  an  unbiased  estimator 
of  the  frequency  of  a single  sinusoid  is  also  a lower  bound  for  the  variance  of  any 
unbiased  estimator  of  the  frequency  of  that  sinusoid  when  other  sinusoids  are  present. 
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COROLLARY  2.7  The  Bhattacharyya  bound  on  the  variance  of  an  unbiased  estimator 
is  at  least  as  tight  as  the  Cramer-Rao  bound  obtained  using  the  same  set  of  first 
derivatives. 

We  will  now  describe  the  computation  of  these  bounds  for  the  sinusoidal  signal  model. 

Derivation  of  Bounds 

The  calculation  of  the  bounds  described  above  for  a given  signal  is  not  concep- 
tually difficult,  but  the  algebraic  manipulations  required  when  using  the  deterministic 
signal  model  can  become  extremely  complex.  The  computation  of  the  Cramer-Rao 
and  second  order  Bhattacharyya  bounds  for  frequency  estimators  of  signals  with  one 
and  two  real  sinusoidal  components  in  additive  white  Gaussian  noise  is  discussed  in 
the  following  sections. 

It  is  often  convenient  in  the  derivation  of  bounds  to  employ  the  logarithm  of 
the  likelihood  function,  F(0)  = log(P(0)),  referred  to  as  the  log-likelihood  function. 
For  real  sinusoidal  signals  in  white  Gaussian  noise,  F(0)  is  given  by 

F(&)  = -j  l°g(27r^2)  - 5Z2  £ ( yik]  ~ H ai  cos(uJik  + <f>i) ) . (2.9) 

fc=fco+l  \ i=l  / 

The  utility  of  the  log-likelihood  function  arises  from  the  identities 

<9F(0)  1 dP{&) 

dQi  ~ P(0)  d&i  ’ 

<92F(0)  0F(0) 

d QidQj  + dQi 

dF(&)  1 <92P(0) 

dQj  ~ P(0)  dQidQj 

used  in  the  derivation  of  the  vectors  v and  w,  which  determine  the  Cramer-Rao  and 
Bhattacharyya  bounds. 
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Derivation  of  the  Cramer-Rao  Bound 

Cramer-Rao  bounds  for  the  estimation  of  the  parameters  of  sinusoids  were  first 
derived  by  Rife  and  Boorstyn  [1,7].  Several  useful  asymptotic  approximations  for  the 
Cramer-Rao  bound  have  been  derived  by  Stoica  and  Nehorai  [2,8].  The  Cramer- 
Rao  bound  is  determined  by  the  inverse  of  the  Fisher  information  matrix  J,  whose 
elements  are  J(i,  j)  = (viVj).  In  terms  of  the  log-likelihood  function, 

/ 1 9P(@)  1 8P(@)\  /8F(&)  8F(@)\ 

' " \P(@)  30,  P(0)  36,  / \ 30,  d@j  /' 

For  the  tightest  possible  bound,  all  of  the  unknown  parameters  of  the  model  signal  y 
must  be  included  in  v: 


v = 


dF  dF  dF  dF  dF 

da2  ’ da\  ’ duj\ ' d<j>i  ’ da ^ ’ 


dF  dF 
du>N  d(j)N 


As  long  as  all  of  the  sinusoidal  frequencies  are  different,  J will  be  nonsingular,  and 
since  (v^vj)  = ( VjVi ),  J is  symmetric. 

To  clarify  which  elements  of  v and  J are  being  referred  to,  a condensed  notation 
will  be  employed,  where  U(0j)  will  stand  for  will  stand  for  (u(0i)r;(0j)), 

and  the  parameters  of  the  model  will  be  shown  explicitly  as  amplitudes,  frequencies, 
phases,  or  standard  deviations.  For  example: 


- <*w«'(»j))  - (an,  ‘ ajj’)  ■ 

The  elements  of  v are  given  by  the  following  partial  derivatives: 

i / I ko+L  \ 

**-’ = L) 

ko+L 

V(ai ) = —z  V]  n[k]  cos (uik  + fa) 
u fe=fc0+l 
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V(vi)  = Y n[k]  k single  + fc) 

u Jb=fco+l 

a*  ko+L 

«(*)  = — \ Y nik]  sin(^fc  + <f>i) 

° *=fc0+l 

In  each  of  the  expectations  (viVj)  which  determine  J,  the  only  random  quan- 
tities are  the  n[k\,  which  are  independent  zero  mean  Gaussian  random  variables.  To 
simplify  the  calculation  of  expected  values  of  the  products  of  the  derivatives,  the 
following  results  are  useful: 


("[*]> 

(n[k]  n[l ]) 

(n[k]2  n[/]) 
^n[fc]2  ra[Z]  n[m]^ 


0 

a2,  k — l 

< 

0,  k ^ l 


0 


oo 

»u 

k = l — m 

£ 

II 

db 

-se 

o' 

l 7^  m 

Because 
upper  triangle. 


J is  symmetric,  it  is  only  necessary  to  calculate  the  elements  in  its 
These  are  given  by: 


J(<r2 ; <72 ) 

J(a2;ai) 

J(<r2\Ui) 


L 

2a* 

= 0 

= 0 

- 0 

j ko+ L 

= ~2  Y cos(^ik  + <f>i)  COS (ujjk  + 4>j) 

° *=fc0+l 
Q . ko~\-L 

= — | Y k cos (uik  -I-  fa)  sin (ujjk  + <f>j ) 

° k=k0+l 
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q . ko+L 

--§  cosfak  + fa)  sin(u)jk  + fa) 

a k=k o+l 
Uia  • ko+L 

-+f  k2  sin (ujik  + fa)  sin(u )jk  + fa) 

° k=k0+l 
ko+L 

T k sin  (a \k  + fa)  sin(a;jA:  + <f>j) 

k=ko  + l 
ko+L 

3 sin(wjA:  + fa)  sin(ujjk  + fa ) 

k—ho-\~\ 


did] 

a2 

did 


Since  the  expected  value  of  the  product  of  v ^ and  any  other  element  of  v is 
zero,  J is  block  diagonal.  Using  the  fact  that 


a 0 

-l 

1/a 

0 

0 A 

0 

A-1 

the  Cramer-Rao  bound  on  estimation  of  a2  may  be  found  immediately  to  be  var(  ^2)> 
2<t4/L.  The  inverse  of  the  submatrix  of  J containing  the  other  parameters  then 
determines  the  bounds  on  estimation  of  the  magnitudes,  frequencies,  and  phases. 

It  is  possible  to  to  simplify  the  sums  of  trigonometric  expressions  in  the  ele- 
ments of  J to  remove  the  summations  over  k,  but  the  resulting  expressions  are  still 
complex,  and  little  insight  is  gained  by  the  simplification.  (The  results  are  given  by 
Porat  [4,  pp.  262-264].)  For  signals  with  more  than  one  or  two  sinusoidal  compo- 
nents, the  most  effective  approach  is  to  calculate  J using  the  expressions  above,  and 
numerically  compute  its  inverse  to  find  the  Cramer-Rao  bounds. 

For  the  case  of  a single  sinusoid,  however,  useful  expressions  may  be  analyti- 
cally derived.  In  this  case,  the  bounds  are  determined  by  a submatrix  of  J: 


1 


a 


2 


£cos  2(uk  + fa  -d/2Y,ksin(2uk  + 2fa  -a/2  £ sin(2u;fc  + 2fa 
-a/2£  A:sin(2u;/c  + 2fa)  a2  £ k2  sin2(u;fc  + fa)  a2  £ k sin2(a;fc  + fa) 
-d/2J2sin(2uk  + 2fa)  a2  £ k sin2 (uk  + fa)  a2£sin2(a ;k  + fa 
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The  Cramer-Rao  bounds  on  estimation  of  a,  u>,  and  <f>  are  given  by  the  diagonal 
elements  of  the  inverse  of  this  matrix;  by  setting  terms  that  are  small  for  large  L to 
zero  and  computing  the  inverse,  an  asymptotic  Cramer-Rao  bound,  useful  for  large 
data  records,  is  obtained. 

A surprising  feature  of  the  result  of  this  computation,  first  noted  by  Rife  [1], 
is  that  the  bound  on  the  variance  of  the  phase  estimate  depends  on  the  location  of 
the  point  k = 0 in  relation  to  the  sample  (i.e.,  the  value  of  A;0).  If  the  first  sample  is 
chosen  to  be  k — 0,  then  the  off-diagonal  terms  given  by  the  sum  a 2 Y A:  sin2  (a; A:  + (f> ) 
do  not  become  negligible  as  L becomes  large;  the  value  of  the  diagonal  term  J(u^) 
is  also  affected  by  the  choice  of  origin.  The  fact  that  the  off-diagonal  terms  do  not 
become  negligible  for  large  L corresponds  to  a coupling  between  the  frequency  and 
phase  estimates.  If  the  origin  is  chosen  to  be  at  the  center  of  the  observed  data, 
the  coupling  term  approaches  zero  as  L increases,  indicating  that  the  frequency  and 
phase  estimates  are  asymptotically  decoupled. 

Although  this  effect  is  surprising,  it  has  a natural  explanation.  While  the 
frequency  and  amplitude  of  a sinusoid  are  translation  invariant,  the  phase  is  defined 
with  respect  to  a fixed  location,  the  point  A:  = 0.  The  mechanism  responsible  for  the 
coupling  between  phase  and  frequency  becomes  clear  if  we  consider  the  problem  of 
estimating  the  phase  of  a signal  from  data  on  an  interval  that  does  not  include  k = 0. 
It  is  more  difficult  to  estimate  the  phase  of  a signal  at  a point  far  from  the  center  of 
the  measurement  interval  because  this  requires  some  degree  of  extrapolation,  which 
increases  the  uncertainty  of  the  phase  estimate.  The  coupling  between  the  phase 
and  frequency  estimates  is  simply  a consequence  of  the  fact  that  knowledge  of  the 
frequency  is  necessary  to  determine  the  phase. 

The  asymptotic  value  of  the  Cramer-Rao  bound  for  the  frequency  estimate  is 
not  altered  by  the  choice  of  origin,  but  the  bound  for  phase  estimation  is  affected; 
in  accordance  with  the  intuitive  justification  given  above,  it  can  be  shown  that  the 
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minimum  value  is  obtained  when  the  origin  is  at  the  center  of  the  data  record  [1]. 
The  asymptotic  bound  for  phase  estimation  is  increased  by  a factor  of  four  if  k = 0 is 
taken  to  be  at  the  start  of  the  data  record,  rather  than  at  the  center;  this  effect  has 
apparently  been  neglected  in  several  derivations  of  the  Cramer-Rao  bound  for  phase 
estimation  [3,9]. 

For  finite  L,  the  bounds  for  both  frequency  and  phase  depend  on  the  choice  of 
origin,  and  the  minimum  value  is  reached  when  the  origin  is  taken  to  be  the  center 
of  the  measurement  interval.  Under  this  assumption,  as  the  number  of  samples  L 
becomes  large,  the  matrix  can  be  approximated  by 


1 


a 


2 


L/  2 0 0 

0 a2L(L  — 1)2/12  0 

0 0 a?L/2 


and  the  Cramer-Rao  bounds  on  estimation  of  a,  u,  and  (j)  approach  the  diagonal 
elements  of 


o 


2 


2 /L  0 

0 24 /a2L(L  - l)2 


0 

0 


0 0 2 L/a2 


The  asymptotic  (for  large  L)  Cramer-Rao  bounds  on  the  estimation  of  the  parameters 
of  a single  real  sinusoid  in  additive  white  Gaussian  noise  are  therefore  given  by 


var(a)  > 
var(o))  > 
var(0)  > 


2 a2 
L ’ 
24<t2 

2cr2 

a2L 
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In  terms  of  the  signal-to-noise  ratio  77  = a2 /2a2,  the  bound  on  the  accuracy  of  the 
frequency  estimate  is 

/A.  12 

> jjjj, 

or,  in  terms  of  the  cycle  frequency  /, 


var(/)  > 


3 

n2r]L 3 


Note  that  the  minimum  variance  of  any  unbiased  frequency  estimate  decreases  asymp- 
totically as  the  cube  of  the  number  of  samples.  This  is  a powerful  argument  for 
maximizing  the  number  of  samples  used,  since  it  indicates  that  the  simultaneous  use 
of  many  consecutive  samples  allows  the  variance  to  be  reduced  very  rapidly.  If  the 
samples  are  not  consecutive,  as  in  the  multiple  snapshot  case  in  bearing  estimation, 
the  variance  decreases  only  as  the  inverse  of  the  number  of  snapshots  [10]. 

Derivation  of  the  Bhattacharvva  Bound 

To  calculate  the  Bhattacharyya  bound,  additional  higher-order  derivatives  of 
the  likelihood  function  are  needed.  Since  P(&)  is  infinitely  differentiable,  all  par- 
tial derivatives  with  respect  to  the  same  set  of  parameters  are  equal.  For  second 
derivatives,  this  means  that,  for  example,  . For  the  Bhattacharyya 

information  matrix  K to  be  nonsingular,  only  one  derivative  with  respect  to  each  set 
of  parameters  may  be  included. 

The  elements  of  w and  K will  be  referenced  with  the  same  notation  used  for 
v and  J,  so  that 

1 d2P{@) 

™(e,,e>)  ~ P(0)  dQidQj  ’ 
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and 


^(0i,0j;0k,e,)  = (w(0i,ej)W(ek,e,))- 

(Note  that  w{@i)  = v(e>)  and  K(ei.&])  = J{et;&j).) 

In  terms  of  the  log-likelihood  function,  w is  given  by 


1 d2P(Q)  d2F{S ) <9F(0)  0F(0) 

™(e,’®J>  ~ P(0)  dQidOj  ~ dOidQj  + 

For  the  tightest  possible  second  order  Bhattacharyya  bound,  all  linearly  independent 
first  and  second  derivatives  of  P(0)  must  be  included  in  w.  These  are  given  by: 


°V) 

w(*i) 

WM 

W(.<t>i) 

^(<T2,<72) 

w(a2,at) 


-LI 

2cr2  \ a2 


ko+L 


S n[/c]2-L 


+ 1 


2 ko+L 

Y n[k]  cos(u>ik  + fa) 


k=k0+\ 


H n[k]  k sin(ujik  + fa) 

® k=fco+l 


ko+L 


y n[k]  sm(cjik  + fa) 

^ fe=fco  + l 
L2  + 2L  L + 2 


ko+L 


4a4  2<t6 


y nW2  + 7^8  £ y «W2w[/]2 


fc=ko+l 


4a8 


fc=ko+l  i=fco+l 


2cr4 


W(<r2,Ui)  = 


y:  n[fc]  cos(u>ik  + 0i) 

fc=fco  + l 
2 fco+ry  ko+L 

+ T7  E E n[*]2n[I]  cos(wiZ  + fa) 

° fc=rfco+l  J=fc0+1 
Ij  -\-  2 ko+Ij 

a,i— — — y n[k]  k sin(u>ik  + fa) 
k=k0+l 
ko+L  ko+L 


2 a4 


ai 

2 a6 


y 5Z  l sin (cjjZ  + fa) 


W(°2,<t>i)  ~ ai  20-4 


A; — fco+1  / — fco+1 
y 2 fe°+jC/ 

a,  — — — y n[/c]  sin^A:  -I-  fa) 

k=ko+\ 
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w(ai,aj)  — 


W(a,,Uj)  ~ 


w(a„<t>j)  = 


W(Vi,Uj)  ~ 


ko+L  ko+L 


- E E n[fc]2n[/]  sin(w</  + fa) 

CT  *=*0+1 i=*0  + l 

rr2 


y cos(uik  + 0j)  cos(ujjk  + 4>j) 


k=ko  + l 

+ ^ E E n[k]n[l]  cos (uik  + fa)  cos (ujjl  + fa) 

G *=*0  + 1 /=*o  + l 

ko+L 

22  k cos(u >ik  + fa)  sin(u)jk  + (frj) 


a,j 

a2 


k=ko  + l 

j ko+L 


E n[k]  k sin(u)ik  + fa) 

° fc=*o+l 
q ko+L  ko+L 

-3  I E n[fc]n[Z]  l cos (uik  + fa)  sin(u^/  + fa) 

0 *=*o+n=*o+i 


ko+L 


-j  22  cos(^«^  + fa)  sin(ujjk  + (f>j) 

° *=*0+i 

^ ko+L 

-Sij- - 22  n[k]  sin(uik  + fa) 

0 *=*0+i 
*0 +L  ko+L 

-3  E E n[k]n[l]  cos(u)ik  + fa)  sin {uijl  + (f)j) 

° *=*o+li=*o+l 


CLiCLj 


ko+L 


22  k2  sin(uik  + fa)  sin(u ijk  + (j)j ) 


— Sij-2  E cos(^i^  + fa) 


k=ko  + l 
ko  +L 


k=ko+l 
ko+L  ko+L 


+ 3r  E E n[k]n[l]  k l s'm+ik  + fa)  sin(u>jl  + fa ) 

a fc=*o+l  i=*o+l 
a^a  ■ ko+L 

— 1~2~  E ^ sin(u;tA;  + fa)  sin(a>jfc  + fa) 

0 *=*0+i 

a ko+L 


- Si:j-^  22  n[k}  k cos(uJik  + fa) 


+ 


a^aj 


k — A?o 

ko+L  ko+L 


E E n[fc]n[Z]  k sin(u + fa)  sin(a jjl  + fa) 


did j 


fc=fco  + l l=ko+l 
ko+L 

22  sin+ik  + fa)  sin {ujjk  + 4>j) 

fc=*o+l 


“Wi) 
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- &ij-\  51  cos(^t k + <f>i) 

° k=ko+l 

CLCl-  ^o+L  ko+L 

+ dr  £ £ n[fc]n[Z]  sin (ujik  + (pi)  sin (uijl  + cpj) 

° fc=fco+l  /=fc0+l 

To  simplify  the  calculation  of  K from  w,  it  is  helpful  to  place  the  elements  of 
w in  a standard  form.  Each  element  of  w may  be  written  as: 


ko+L 

+ Z 

k= 

ko+L  ko+L 

+ E E n[*]n[Z]C(ej,e>)[ M] 

k=ko  + l l=k 0 + 1 
ko+L 

+ D (&„&:)  Z nW2 

k=ko+l 
ko+L  ko+L 

+ Z Z n[*]2n[Z]  £(©.,©.)[/] 

k=ko+l  l=ko+l 

ko+L  ko+L 

+ ^,.9,)  £ £ »[*]2"[i|J 

& — fco+l  i — Auo+l 

Using  this  form,  each  element  of  K may  be  calculated  from  the  coefficients  of  the  two 
relevant  elements  of  w.  The  value  of  K(Qi  Qj.Qk  Ql ) is  the  sum  of  the  expected  values 
of  the  components  of  the  cross  product  of  i^e,^)  and  W(&k^ly  Since  each  w is  a sum 
of  terms  with  the  coefficients  A,  B,  C,  D , E,  and  F defined  above,  the  cross  product 
can  be  expressed  in  terms  of  these  coefficients.  The  expected  value  of  the  product  of 
the  A term  in  u^©.,©^  and  the  C term  in  W(©fc)©()  will  be  denoted  K^c,  and  similarly 
for  each  of  the  other  terms.  Using  the  fact  that  the  expected  value  of  any  odd  power 
of  n[k]  is  zero,  we  can  easily  see  that  many  of  these  components  are  zero,  as  shown 
in  the  chart  below. 
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E(e„ej) 

A©*,©!) 

KAa 

0 

Kac 

Kad 

0 

Kaf 

0 

K BB 

0 

0 

Kbe 

0 

C(ek,&t) 

KCa 

0 

Kcc 

Kcd 

0 

Kcf 

Kda 

0 

Kdc 

Kdd 

0 

KDf 

0 

Keb 

0 

0 

Kee 

0 

Kfa 

0 

Kfc 

Kfd 

0 

KpF 

To  determine  the  values  of  each  of  these  components,  we  must  calculate  ex- 
pected values  of  summations  of  products  of  the  n[fc].  In  any  quadruple  summation 
with  indices  ranging  over  L values,  there  will  be 


L 

4 L{L  - 1) 

3 L(L  - 1) 

6L(L  — 1)(L  — 2) 
L(L  — 1)(L  — 2)(L  — 3) 


terms  with  all  4 indices  identical, 
terms  with  only  3 indices  identical, 
terms  with  2 pairs  of  indices  identical, 
terms  with  only  1 pair  of  identical  indices,  and 
terms  with  all  indices  different. 


Similarly,  in  any  triple  summation,  there  will  be 


L 

3L(L  - 1) 
L(L  - 1)(L  - 2) 


terms  with  all  3 indices  identical, 
terms  with  only  2 indices  identical,  and 
terms  with  all  indices  different. 
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Finally,  in  addition  to  the  expected  values  of  products  of  the  n[k]  listed  earlier,  we 
will  also  require  the  following  values: 


(n[*]2n[;']2n[fc]) 

(n[i\2n[j]2n[k]2) 

(n[i]2n[j}2n[k]2n[l^ 


(n[i]2n[j]2n[k]2n[l]2} 


0 

15a6,  i — j = k 
- 3a6,  any  two  indices  equal 

ct6,  *7 

0 

' 

105a8,  i — j = k — l 
15a8,  any  three  indices  equal 
' 9u8,  two  pair  of  indices  equal 

3a8,  one  pair  of  indices  equal 

o'8, 


Using  these  properties,  we  can  determine  the  value  of  each  of  the  constants 
in  the  cross  product.  Since  KXy  and  KYx  are  simply  related,  we  will  only  list  the 
unique  elements: 


Kaa 

KAc 

Kad 

Kaf 

Kbb 

Kbe 


- Aefc,e,) 

ko+L 

= a2A{ek)Ql)  Y C{&i)@j)[k,k] 

k=ko-\-l. 

= La2  A(QkiQ,)D(Qi<Q.) 

= (L2  + 2 L)  a4A{&ki&l)F{ei<ej) 

ko+L 

= °2  #]#(©*  ,©,)[£] 
k=ko  + l 

kQ-\-L 

= (L  + 2)  a4  Y £(©*,©#]£(©„©,#] 

fc=fco  + l 
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kQ-\-L  ko~\~L 

Kcc  = o'  Y.  Y.  C(e„e,)[fc,*:]C,e»,e,)[M] 

fc=fco+l  l=ko+l 
ko-\-L  ko-\-L 

+ (jA  H X C'(0„0J)[^^]C'(0fc,0l)[fc^] 

k— Aco  ~ l~  1 l=ko-\-l 
ko+L  ko  -\-L 

+ (j4  X X ^(©^©dtMCW©,)^] 

k— /joH“1  l= A?o  ~ H 1 

ko-\-L 

Kcd  = {L  + 2)  <T4D(@i!ej.)  C(0k,0i)[^) 

fc=fco+l 

KCf  = (T2  + 6L  + 8)  a6^©,,©^  ^2  C(0fc,©i)[^)  k] 

k—ko+l 

KDd  — {L2  + 2L)  a4D(0i!©j.)D(©(fe,e,) 

Kdf  = L(L2  + 6L  + 8)  (T6D(QktQl)F(QitQj) 

ko+L 

Kee  = (L2  + 6L  + 8) cr6  ^ £,(0,,0>)[fc]-E,(0,c, ©,)[£] 

fe=feo+l 

/Cff  = (T4  + 12L3  + 44  L2  + 48L)  a8^©^©^^©^©,) 

We  now  have  the  information  necessary  to  calculate  the  Bhattacharyya  bound 
for  estimation  of  the  amplitudes,  frequencies,  and  phases  of  sinusoidal  signals  in  white 
Gaussian  noise.  It  is  clearly  impractical  to  calculate  the  bound  analytically,  but  the 
equations  above  may  be  easily  implemented  using  a computer  to  numerically  calculate 
bounds  for  specific  signals.  It  is  important  to  note  that,  when  the  signal-to-noise  ratio 
is  high,  or  the  number  of  samples  very  large,  the  Bhattacharyya  information  matrix 
may  be  ill  conditioned;  however,  in  this  case  the  Bhattacharyya  bound  is  very  close 
to  the  Cramer-Rao  bound,  and  numerical  errors  in  the  computation  of  the  bound  are 
easily  detected.  Nevertheless,  it  is  wise  to  compute  the  bound  in  double  precision, 
and  monitor  the  condition  of  the  Bhattacharyya  matrix  to  ensure  that  the  results  are 
of  acceptable  accuracy. 

The  results  of  this  calculation,  and  a discussion  of  the  circumstances  in  which 
the  Bhattacharyya  bound  is  a significant  improvement  over  the  Cramer-Rao  bound, 
are  given  in  the  following  section. 
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Comparison  of  Bounds 

We  have  seen  above  that  the  Bhattacharyya  bounds  on  frequency  estimation 
are  tighter  than  the  Cramer-Rao  bounds;  in  this  section,  we  will  compare  the  two 
bounds  to  determine  when  the  difference  is  significant.  Since  existing  frequency  es- 
timation techniques  approach  the  Cramer-Rao  bound  very  closely  when  there  are 
many  samples,  the  signal-to-noise  ratio  is  high,  and  the  sinusoids  in  the  input  data 
are  widely  separated,  we  do  not  expect  to  find  a significant  difference  between  the 
two  bounds  under  these  circumstances.  Substantial  differences  are  found,  however, 
in  cases  where  the  number  of  samples  is  small,  the  signal-to-noise  ratio  is  low,  or  the 
data  contain  sinusoids  whose  frequencies  are  close. 

First,  we  will  examine  the  effect  of  variations  in  the  number  of  samples  on  the 
Cramer-Rao  and  Bhattacharyya  bounds.  Figure  2.1  shows  the  Cramer-Rao  bound 
and  the  Bhattacharyya  bound  on  the  variance  of  a frequency  estimator  for  a single 
sinusoid  as  the  number  of  samples  varies.  The  parameters  of  the  sinusoid  are  a = 1, 
= 0.2,  0 = 0,  and  cr2  = 30. 

Figure  2.1  clearly  shows  the  extremely  rapid  decrease  in  both  bounds  as  the 
number  of  samples  increases.  This  decrease  is  asymptotically  proportional  to  L3, 
and  the  rate  is  close  to  this  asymptotic  limit  even  for  relatively  small  data  records. 
Variations  about  the  asymptotic  rate  occur  for  small  record  lengths  because  the  of 
effect  of  fractional  cycles  of  the  signal  in  the  data  record.  Notice  that  the  period 
of  the  perturbation  around  the  asymptotic  line  is  the  same  as  the  signal  period, 
approximately  31  samples;  in  the  example  shown,  the  signal  is  cos(0.2/i),  and  the 
bounds  decrease  most  slowly  when  L is  a multiple  of  107T.  The  exact  shape  of  the 
perturbation  about  the  asymptotic  limit  is  dependent  on  the  frequency  and  the  phase 
of  the  signal,  but  a similar  effect  occurs  for  any  choice  of  these  parameters. 

The  largest  difference  between  the  the  Cramer-Rao  and  Bhattacharyya  bounds 
occurs  for  small  data  records.  For  the  example  shown,  the  Bhattacharyya  bound  on 
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Figure  2.1:  Cramer-Rao  (dashed  line)  and  Bhattacharyya  (solid  line)  Bounds  on  the 

Variance  of  ill  for  a Single  Sinusoid 
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the  variance  is  twice  as  large  as  the  Cramer-Rao  bound  when  L = 60,  and  five  times 
as  large  when  L = 20.  The  difference  would  of  course  be  greater  than  this  for  smaller 
L , but  bounds  on  the  variance  that  are  greater  than  0.8  have  little  meaning;  a variance 
of  7t2/12  « 0.822  can  be  achieved  with  no  computation  at  all,  by  randomly  choosing 
a Cj  uniformly  distributed  between  0 and  n.  As  the  number  of  samples  becomes  large, 
the  two  bounds  approach  equality;  however,  for  finite  L the  Bhattacharyya  bound  is 
always  larger  than  the  Cramer-Rao  bound.  This  substantiates  the  earlier  assertion 
that  no  estimator  of  sinusoidal  frequencies  can  be  efficient  for  finite  L. 

Next,  we  will  examine  the  effect  of  frequency  variation  on  the  bounds.  Fig- 
ure 2.2  shows  the  two  bounds  for  a single  signal  with  varying  frequency,  and  a = 1, 
(f>  — 0,  a2  = 0.1,  and  L = 19.  The  two  bounds  differ  significantly  only  for  values  of  uj 
close  to  0 or  7r.  A detailed  view  of  the  low  frequency  region  is  shown  in  Figure  2.3. 

The  increase  in  the  bounds  on  estimation  when  ui  approaches  0 or  tt  is  a 
consequence  of  the  fact  that  signals  at  these  frequencies  are  sums  of  closely  spaced 
complex  sinusoids.  The  presence  of  another  signal  at  a nearby  frequency  decreases 
the  accuracy  to  which  either  frequency  may  be  determined.  We  shall  see  below  that 
a similar  effect  occurs  for  closely  spaced  real  sinusoids. 

The  final  example  with  a single  sinusoid,  Figure  2.4,  shows  the  effect  of  changes 
in  the  signal-to-noise  ratio.  For  this  example,  a = 1,  u)  = 0.2,  0 = 0,  and  L — 39. 
Since  the  power  in  a real  sinusoid  with  amplitude  a is  a2/ 2,  the  signal-to-noise  ratio 
is  a2 /2a2. 

The  bounds  on  estimation  of  the  frequency  increase  if  additional  sinusoids 
are  added  to  the  signal,  and  the  change  in  the  bounds  depends  on  the  frequency 
separation  of  the  two  sinusoids.  Figure  2.5  shows  the  bounds  for  a sinusoid  with  the 
same  parameters  as  used  for  Figure  2.4  (ai  = 1,  — 0.2,  <f>i  = 0),  when  a second 

sinusoid  with  parameters  a2  = 1,  u2  = 0.3,  and  <f>2  = 0 is  present,  for  L = 39.  The 
addition  of  the  second  signal  has  caused  the  bounds  on  estimating  the  frequency  of 
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Figure  2.2:  Cramer-Rao  (dashed  line)  and  Bhattacharyya  (solid  line)  Bounds  for  a 

Single  Sinusoid  of  Varying  Frequency 
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Frequency  (radians/s) 


Figure  2.3:  Cramer-Rao  (dashed  line)  and  Bhattacharyya  (solid  line)  Bounds  for  a 

Single  Sinusoid  at  Low  Frequencies 
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Figure  2.4:  Cramer-Rao  (dashed  line)  and  Bhattacharyya  (solid  line)  Bounds  for  a 
Single  Sinusoid  as  a Function  of  Sign al-to- Noise  Ratio 
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the  first  signal  to  increase  by  a factor  of  more  than  ten;  it  has  also  caused  the  ratio 
between  the  two  bounds  to  increase.  Note  that  the  separation  between  the  sinusoids 
A/  = 0.1/7T  = 1/31.42  « 1/L,  so  the  two  signals  are  relatively  close  in  frequency. 

The  effect  of  changes  in  the  frequency  separation  is  shown  in  Figure  2.6.  As 
the  frequency  separation  is  reduced,  the  bounds  on  the  variance  of  estimators  increase 
dramatically.  In  this  example,  ax  = a2  = 1,  wx  = 0.2,  </>x  = (p2  = 0,  L = 39,  a2  = 0.1, 
and  u>2  varies  between  0.2  and  0.5.  (The  fact  that  the  increase  in  the  bounds  occurs 
near  the  resolution  limit  for  the  classical  spectral  estimation  techniques,  A / « 1/L, 
is  coincidental;  if  the  noise  power  is  reduced,  the  increase  occurs  at  smaller  frequency 
separations.) 

The  Bhattacharyya  bound  derived  in  this  chapter  is  a tighter  bound  for  the 
variance  of  an  unbiased  estimator  of  the  frequencies  of  real  sinusoids  than  the  widely 
known  Cramer-Rao  bound.  The  difference  between  the  two  bounds  is  most  pro- 
nounced under  three  circumstances:  when  the  number  of  samples  is  small,  when  the 
signal-to-noise  ratio  is  low,  or  when  signal  contains  multiple  closely  spaced  real  or 
complex  sinusoids.  The  Bhattacharyya  bound  is  therefore  most  useful  in  evaluating 
the  performance  of  frequency  estimators  when  these  conditions  apply. 
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Figure  2.5:  Cramer-Rao  (dashed  line)  and  Bhattacharyya  (solid  line)  Bounds  for 
One  of  Two  Sinusoids  as  a Function  of  Signal-to-Noise  Ratio 
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Figure  2.6:  Cramer-Rao  (dashed  line)  and  Bhattacharyya  (solid  line)  Bounds  for 
One  of  Two  Sinusoids  as  a Function  of  Frequency  Difference 


CHAPTER  3 

SUBSPACE  ESTIMATION  TECHNIQUES 

All  subspace  estimators  are  based  on  the  properties  of  invariant  subspaces  of 
autocorrelation  matrices.  These  properties  give  the  subspace  techniques  their  high 
performance,  but  are  also  responsible  for  their  high  computational  cost,  since  finding 
eigenvalues  or  eigenvectors  of  a general  M x M matrix  requires  0(M3)  operations.  In 
the  following  section,  the  most  widely  used  subspace  methods  for  frequency  estimation 
are  described,  and  the  computational  techniques  used  are  detailed.  This  will  provide 
a framework  for  later  discussion  of  fast  subspace  techniques. 

Eigendecomposition  of  Exact  Autocorrelation  Matrices 

If  the  autocorrelation  matrix  R of  the  input  signal  is  known  exactly,  the  sub- 
space techniques  can  determine  the  frequencies  of  sinusoids  in  the  input  signal  exactly. 
Of  course,  in  practice  the  autocorrelation  matrix  is  not  known  precisely,  but  must  be 
estimated  from  the  data.  Nevertheless,  the  principles  on  which  the  subspace  tech- 
niques are  based  may  be  seen  most  clearly  by  beginning  with  the  simplest  case,  where 
we  assume  exact  knowledge  of  the  autocorrelation  matrix. 

Although  the  focus  of  this  work  is  on  real-valued  signals,  in  this  chapter  com- 
plex signals  are  assumed  for  notational  convenience.  For  any  of  the  expressions  given 
here  for  complex  signals,  equivalent  expressions  for  real  signals  may  be  found  by  ex- 
pressing each  real  sinusoid  as  the  sum  of  2 complex  sinusoids.  The  expressions  for 
real  signals  that  are  needed  in  later  chapters  will  be  given  as  they  are  developed. 
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Noise-Free  Signals 

We  will  begin  with  the  simplest  case,  where  no  noise  is  present.  Consider  a 
discrete-time  signal  composed  of  N complex  sinusoids: 

N 

s[fc]  = 52  a.  exp(juiik),  (3.1) 

«=1 

where  a,  ^ 0 and  -n  < Ui  < n for  each  i,  and  a >,•  ^ l Oj  if  i ± j.  This  is  the  complex 
form  of  the  deterministic  signal  model  described  in  Chapter  2.  The  elements  of  the 
autocorrelation  matrix  of  this  signal  are  given  by 

Rxx{i,  j)  = {x[k0  + i]*z[A;o  + j]) 

= x[fc0  + i]*x[A:0  +j]. 

Because  it  is  an  autocorrelation,  Rxx  is  Hermitian;  for  real  data,  Rxx  is  symmetric. 
When  x[k]  is  of  the  form  given  by  (3.1),  Rxx  may  also  be  written  as 

Rxx  = SS*  (3.2) 

where  the  columns  of  the  M x N matrix  S are  defined  by 

s,  = at  exp(juJiko)  s,, 

and  S;  is  the  vector  valued  function  (of  length  M) 

^ = s (a/i)  = [1,  exp(j^i) , exp(j2uji) , . . . , exp (j(M  - l)wi)]T  . 

The  autocorrelation  of  a noise-free  signal  composed  of  sinusoids  is  thus  equal  to  the 
outer  product  of  a set  of  N vectors  of  length  M;  and  the  frequencies  that  define 
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those  vectors  are  the  same  as  the  frequencies  of  the  sinusoids  in  the  signal.  If  the 
input  signal  is  composed  of  Nr  real  sinusoids,  then  Rxx  can  be  expressed  as  a sum  of 
N = 2Nr  complex  sinusoids,  so  the  rank  of  is  twice  the  number  of  real  signals. 

For  M > 1 and  —7 r < u>i,uj2  < 7r,  s(u>i)  and  s(u)2)  are  linearly  independent  if 
Ui  ^ u)2-  If  M > N and  all  of  the  u>i  are  distinct,  the  N vectors  s * will  be  linearly 
independent,  and  Rxx  will  have  rank  N.  An  eigendecomposition  of  Rxx  is 

M 

= (3.3) 

t=i 

where  we  will  also  require  that  the  eigenvectors  are  ordered  so  Ax  > A2  > . . . > \M. 
Note  that  since  it  is  an  autocorrelation  matrix,  Rxx  is  positive  semidefinite,  so  all  of  its 
eigenvalues  Xi  are  nonnegative.  With  the  eigenvalues  of  Rxx  arranged  in  descending 
order,  only  the  first  N are  nonzero. 

Signals  in  White  Noise 

Now  suppose  that  x[fc]  is  corrupted  by  an  additive  noise  signal  e[k\  that  is 
uncorrelated  with  x[k].  The  received  signal  is  given  by: 

N 

y[k]  exp(. jutk)  + e[k\. 

i= 1 

If  the  noise  elk]  is  white  with  variance  ol,  the  autocorrelation  matrix  R of  the 
received  signal  will  be 

Hyy  = R-ix  + P^- 

Using  (3.3)  and  the  fact  that  the  eigenvectors  of  Rxx  are  orthogonal, 

M 

i = Ee««f. 

i= 1 


(3.4) 
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we  can  write  (3.4)  as 


N M 

Rra  = E \eie?  + al  E eiei- 

i= 1 i=l 

This  shows  that  the  eigenvectors  of  RIX  are  also  eigenvectors  of  Ryy,  and  that 
an  eigendecomposition  of  ~Ryy  is  given  by: 

N M 

Ryy  = E (Ai  + al)  eief  + E 

4 = 1 4=JV+1 


Since  the  first  N Xi  are  all  greater  than  zero,  the  N largest  eigenvalues  of  Ry?/  are 
greater  than  and  the  remaining  M — N eigenvalues  are  equal  to  o\.  The  eigen- 
vectors corresponding  to  the  N largest  eigenvalues  of  R^  are  referred  to  as  the  signal 
eigenvectors,  and  the  remaining  eigenvectors  of  Hyy  are  referred  to  as  the  noise  eigen- 
vectors. The  invariant  subspaces  spanned  by  these  two  groups  of  eigenvectors  are 
referred  to  as  the  signal  and  noise  subspaces,  respectively.  Two  important  properties 
of  this  decomposition  are  the  foundation  for  the  subspace  methods. 

First,  because  the  si  span  the  signal  subspace  and  because  s(co'i)  and  s(u^)  are 
linearly  independent  as  long  as  i ^ j,  the  only  values  of  s(u>)  that  lie  entirely  in  the 
signal  subspace  are  the  sfa).  The  signal  and  noise  subspaces  are  orthogonal  and 
together  span  the  entire  M dimensional  space,  so  any  vector  of  length  M that  does 
not  lie  entirely  in  the  signal  subspace  has  a nonzero  projection  on  the  noise  subspace. 
This  means  that  the  only  roots  of  s (w)Hv,  for  any  vector  v in  the  noise  subspace,  are 
the  original  frequencies  ui.  For  any  other  value  of  u,  s(ui)  is  linearly  independent  of 
the  si5  and  will  have  a nonzero  projection  on  the  noise  subspace,  that  is,  s (ui)Hv  ^ 0 
if  u ± u>i. 

This  property  allows  us  to  determine  the  frequencies  in  the  input  signal  by 
finding  the  roots  of 


p(u) 


2 


2 
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for  — 7r  < uj  < 7r,  where  VN  is  any  M x K matrix  whose  columns  are  in  the  noise 
subspace.  Subspace  estimators  that  rely  on  this  property  are  referred  to  as  noise 
subspace  methods. 

A second  important  property  arises  because  the  matrix  Kyy  is  Hermitian  posi- 
tive semidefinite.  The  eigenvectors  of  Ryy  are  the  same  as  its  singular  vectors,  and  its 
eigenvalues  are  the  same  as  its  singular  values.  Using  the  properties  of  the  singular 
value  decomposition,  it  can  be  shown  that 


is  the  best  rank  N approximation,  in  the  2-norm  sense,  to  Ryy  [34],  Ryy  may  there- 
fore be  used  as  a noise- reduced  approximation  to  RZI.  Estimators  that  follow  this 
approach  are  referred  to  as  signal  subspace  estimators. 

Signals  in  Colored  Noise 

In  the  discussion  above,  it  was  assumed  that  the  noise  was  white.  If  the  noise 
is  not  white,  the  subspace  methods  may  still  be  employed,  but  the  computation  of  the 
signal  and  noise  subspaces  is  somewhat  more  complicated.  First,  it  is  necessary  that 
the  noise  autocorrelation  matrix  £ be  known  or  estimated.  Then,  instead  of  comput- 
ing invariant  subspaces  for  the  standard  problem  Ryye  = Ae,  the  invariant  subspaces 
of  the  generalized  eigenproblem  Ryye  = A£e  are  computed.  If  £ is  nonsingular,  the 
invariant  subspaces  of  Ryye  = A£e  are  the  same  as  those  of  the  modified  problem 
£_1RVJ/e  = Ae;  intuitively,  we  may  view  this  as  a whitening  transformation,  which 
converts  the  generalized  eigenproblem  into  a standard  problem. 

The  matrix  pencil  defined  by  Ryy  and  £ is  definite,  since  £ is  positive  definite. 
In  addition  to  the  simple  transformation  described  above,  several  other  conventional 
methods  exist  for  solving  the  definite  generalized  eigenproblem.  Further  details  on 


N 


(3.5) 


i=l 
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the  methods  used  to  compute  the  signal  and  noise  subspaces  in  the  colored  noise  case 
are  given  later  in  this  chapter,  where  the  computation  of  the  subspace  methods  is 
discussed. 


Eigendecomposition  of  Estimated  Autocorrelation  Matrices 

When  the  subspace  methods  are  used  on  noisy  data,  the  autocorrelation  is  not 
known  exactly,  but  must  be  estimated  from  the  data.  The  properties  of  the  exact 
autocorrelation  matrix  described  above  are  not  true  for  the  estimated  matrix;  never- 
theless, useful  frequency  estimates  may  still  be  obtained  by  assuming  the  properties 
are  good  approximations  for  the  estimated  matrix  as  well.  Any  estimated  autocorre- 
lation matrix  will  have  an  eigendecomposition  given  by 

N M 

= E \e ,-ef,  (3.6) 

i=l  i=N+ 1 

where  the  estimated  signal  and  noise  subspaces  are  shown  separately.  Note  that  the 
estimated  noise  eigenvalues  are  unequal  with  probability  one  when  noise  is  present 
[10].  The  estimated  noise  subspace  is  different  from  the  exact  noise  subspace,  and 
will  no  longer  be  orthogonal  to  the  signal  vectors  so  the  function 


p(w) 


(3.7) 


where  V N is  any  matrix  with  columns  in  the  estimated  noise  subspace,  will  no  longer 
have  roots  at  the  (Typically,  it  will  not  have  any  real  roots  at  all.)  If  the  autocor- 
relation estimate  is  close  to  the  actual  autocorrelation,  the  estimated  noise  subspace 
will  be  a good  approximation  to  the  noise  subspace  of  the  exact  autocorrelation  ma- 
trix, and  it  will  be  possible  to  estimate  the  ivi  by,  for  example,  finding  the  N smallest 
minima  of  (3.7)  for  —n  < u < n.  Since  finding  the  roots  of  a polynomial  is  usually 
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simpler  than  finding  minima  of  an  arbitrary  function,  an  alternative  way  of  finding 
the  minima  is  to  locate  the  roots  of 


p(z)  = 


(3.8) 


where 


= [l  ,zt. 


jw-i 


The  N smallest  minima  of  equation  (3.7)  correspond  to  the  N roots  of  equation  (3.8) 
closest  to  the  unit  circle. 

If  the  input  data  are  real,  Rra  is  real  and  symmetric,  and  the  eigenvectors  of 
Ryy  are  real.  Since  a real  sinusoid  is  the  sum  of  2 complex  sinusoids,  the  dimension 
of  the  signal  subspace  is  twice  the  number  of  real  signals.  Because  the  eigenvectors 
are  real,  p{ui)  — p(—u>)  and  p(z)  = p(z*)  for  any  choice  of  VN,  as  would  be  expected. 
This  allows  the  search  for  minima  to  be  restricted  to  0 < u>  < n for  real  signals,  and 
the  search  for  roots  to  be  restricted  to  the  upper  half-plane. 


Estimating  the  Number  of  Signals 

In  the  preceding  discussion,  it  has  been  implicitly  assumed  that  the  number  of 
signals  present  in  the  input  data  was  known.  If  this  is  not  the  case,  the  need  to  esti- 
mate the  number  of  signals  complicates  the  use  of  the  subspace  methods.  Analogous 
difficulties  arise  with  other  methods,  and  the  number  of  signals  estimation  problem 
is  the  subject  of  ongoing  research. 

In  some  circumstances,  determination  of  the  number  of  signals  is  simple:  if 
the  signal-to-noise  ratio  is  high,  then  there  will  be  a gap  between  the  magnitudes  of 
the  signal  and  noise  eigenvectors  in  (3.6),  that  will  indicate  the  number  of  signals. 
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At  low  signal-to-noise  ratios,  however,  it  becomes  difficult  to  determine  the  number 
of  signals  in  this  manner. 

One  approach  is  to  perform  likelihood  ratio  tests  on  the  eigenvalues  of  Ryy. 
Although  this  is  a straightforward  approach,  it  is  complicated  by  the  need  to  set  a 
threshold  for  accepting  the  hypothesis  that  a certain  number  of  signals  are  present; 
this  introduces  a subjective  element  into  the  process,  since  the  choice  of  threshold  is 
arbitrary.  Drawing  on  information  theoretic  arguments,  several  authors  [11-14]  have 
proposed  alternatives  that  do  not  require  selection  of  a threshold.  These  approaches 
use  an  estimator  of  the  form 


/(maxP„(©))  -p(n), 
0 


where  /(max@  Pn(&))  is  some  function  of  the  maximum  value  of  the  likelihood  func- 
tion for  a model  with  n signals  JPn(0),  and  p(n)  is  a penalty  function,  which  balances 
the  improvement  in  likelihood  with  increasing  number  of  signals  against  a cost  at- 
tributed to  increasing  model  complexity.  The  two  most  widely  used  variants  of  this 
approach  are  those  due  to  Akaike  [11],  (the  Akaike  information  criterion,  or  AIC), 
and  to  Schwartz  and  Rissanen  [12, 13]  (the  minimum  description  length  criterion,  or 
MDL).  In  these  two  cases,  the  estimated  number  of  signals  is  given  by  the  value  of  n 
that  minimizes  the  functions 

AIC{n)  = —2  log(max  Pn(Q))  + 2NP, 

MDL(n)  = -log(maxPn(0))  + ^ArplogL, 

where  Np  is  the  number  of  parameters  in  the  model  used  to  calculate  the  likelihood, 
and  L is  the  number  of  samples  used  to  estimate  the  model  parameters. 
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The  major  difficulty  in  applying  these  techniques  to  the  frequency  estimation 
problem  is  the  difficulty  of  calculating  the  maximum  likelihood  for  the  deterministic 
signal  model,  as  discussed  in  Chapter  2.  The  calculation  of  the  maximum  likelihood 
for  a single  value  of  n requires  the  use  of  numerical  maximization  procedures  to  search 
for  a maximum  of  the  likelihood  function,  and  the  fact  that  the  likelihood  function 
is  multimodal  requires  the  use  of  a high  precision  frequency  estimation  technique 
to  generate  a starting  value  for  the  maximization.  Often,  subspace  estimators  are 
used  for  this  purpose.  It  is  clearly  impractical  to  attempt  to  use  the  exact  maximum 
likelihood  to  estimate  the  number  of  signals  with  the  AIC  or  MDL,  in  the  context  of 
a subspace  estimation  procedure,  since  calculating  the  maximum  for  just  one  value 
of  n requires  more  computation  than  the  entire  subspace  frequency  estimation. 

An  alternative  strategy  is  to  employ  a different  signal  model  for  which  the 
likelihood  is  more  easily  calculated.  The  stochastic  signal  model,  which  assumes  the 
signal  is  stationary  and  Gaussian,  has  been  widely  used  for  this  purpose.  Using  this 
model,  Wax  and  Kailath  [15, 16]  developed  forms  of  the  AIC  and  MDL  estimators 
suitable  for  use  in  subspace  estimation.  These  estimators  are  both  based  on  a measure 
a of  the  inequality  of  the  noise  subspace  eigenvalues: 


where  the  A^  are  the  maximum  likelihood  estimates  of  the  eigenvalues  of  Ryj/.  Wax  [16] 


M — n 


(3.9) 


i=n+l 


shows  that,  under  the  stochastic  signal  model,  these  are  equal  to  the  eigenvalues  A^ 
of  the  covariance  estimate  of  the  autocorrelation  matrix. 


If  the  input  signals  are  complex,  the  estimators  are: 


AICc(n ) — — 2L(M  — n)  log  a + 2n{2M  — n + 1) 


(3.10) 
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MDLc(n)  = —L(M  — n)  log  a + ^n(2M  — n + 1)  log  L.  (3.11) 

where  M is  the  dimension  of  the  autocorrelation  matrix,  and  L is  the  number  of 
points  in  the  time  series  from  which  the  matrix  was  derived.  (Note  that  the  versions 
of  equations  (3.10)  and  (3.11)  given  in  [15,16]  contain  a typographical  error  in  the 
last  term,  which  has  been  corrected  here;  the  corrected  expressions  are  also  given 
in  [17].)  For  real  signals,  the  number  of  model  parameters  should  decrease,  because 
the  eigenvectors  are  real;  however,  the  resulting  mismatch  between  the  likelihood 
term  and  the  penalty  function  sometimes  causes  poor  performance  in  practice,  so  the 
complex  signal  form  is  generally  preferred. 

Since  each  real  signal  is  associated  with  two  eigenvalues  of  the  autocorrelation 
matrix,  it  is  reasonable  to  restrict  the  values  of  n in  the  real  signals  case  to  multiples 
of  2.  If  the  number  of  real  signals  is  nR , the  estimators  are 

AICn(nR ) = —2L(M  — 2nR)  log  a + 4nfi(2M  — 2nR  + 1),  (3.12) 

MDLR(nR ) = ~L(M  - 2nR)  logo;  + nR(2M  - 2nR  + 1)  log L.  (3.13) 

The  estimated  number  of  signals  N for  either  method  is  the  value  of  n that 
minimizes  AIC(n)  or  MDL(n).  It  has  been  shown  [15]  that  the  AIC  is  not  a consis- 
tent estimator  of  the  actual  number  of  signals,  while  the  MDL  is;  this  is  in  agreement 
with  empirical  findings  that  the  AIC  tends  to  overestimate  the  number  of  sinusoidal 
signals  in  a data  record. 

Although  subsequent  work  [18, 19]  has  developed  versions  of  the  MDL  that 
are  applicable  for  a wider  range  of  input  signals,  there  is  no  entirely  satisfactory 
method  for  estimating  the  number  of  signals.  Since  the  primary  focus  of  this  work 
is  on  the  computational  efficiency  of  subspace  methods,  rather  than  the  number  of 
signals  estimation  problem,  we  will  restrict  our  attention  to  developing  techniques 
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for  computing  the  AIC  and  MDL  with  the  fast  subspace  algorithms  developed  later. 
These  techniques  are  described  in  Chapter  8. 

Subspace  Methods 

Although  many  different  subspace  methods  have  been  proposed,  and  each  has 
its  own  advantages  and  disadvantages,  only  a few  techniques  have  achieved  widespread 
application.  To  illustrate  the  similarities  between  the  subspace  techniques,  and  to 
provide  a framework  for  later  discussion  of  fast  subspace  methods,  we  will  briefly 
describe  the  first  subspace  technique,  the  Pisarenko  harmonic  decomposition,  and 
then  discuss  the  three  most  widely  used  techniques:  the  principal  components  method, 
MUSIC,  and  ESPRIT. 

After  the  discussion  of  each  technique,  a brief  listing  of  the  computational 
steps  for  calculating  frequency  estimates  is  given.  The  algorithms  employ  a mixture 
of  conventional  mathematical  notation  and  MATLAB-style  notation.  In  Matlab 
notation,  i : j is  used  to  refer  to  a range  of  array  indices,  so  that  a(3 : 6)  is  a vector 
composed  of  elements  3 through  6 of  a,  and  B{  1 : 2, 1 : 3)  is  the  leading  2x3  submatrix 
of  B. 

The  Pisarenko  Harmonic  Decomposition 

The  Pisarenko  harmonic  decomposition  [20]  was  the  first  of  the  subspace  tech- 
niques to  be  proposed.  It  assumes  that  the  number  of  signals  is  known,  and  that  the 
dimension  M of  the  autocorrelation  matrix  is  chosen  to  be  one  more  than  the  number 
of  signals  (or  M = 2NR  + 1 for  real  signals).  The  estimated  noise  subspace  is  therefore 
spanned  by  a single  eigenvector  eM.  Then,  using  (3.7)  or  (3.8),  with  V = eM,  the 
estimated  frequencies  ui  are  found.  Note  that  there  is  no  ambiguity  in  the  selection 
of  minima  or  roots,  since  the  polynomial  has  at  most  N minima  on  the  unit  circle  or 
roots  in  the  complex  plane,  the  same  as  the  number  of  signals. 
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The  statistical  properties  of  the  Pisarenko  harmonic  decomposition  have  been 
examined  by  Stoica  and  Nehorai  [21].  The  technique  is  asymptotically  unbiased, 
but  statistically  inefficient,  that  is,  the  variances  of  the  frequency  estimates  do  not 
approach  the  Cramer-Rao  bounds  as  the  number  of  samples  increases.  In  practice, 
the  performance  of  the  Pisarenko  method  is  inferior  to  other  subspace  techniques;  its 
principal  advantage  is  its  simplicity. 


Algorithm  3.1:  Pisarenko  Harmonic  Decomposition 
Determine  the  number  of  signals  N 

Estimate  the  N + 1 x N + 1 signal  autocorrelation  matrix  R 
if  noise  is  colored 

Estimate  the  N + 1 x N 4-  1 noise  autocorrelation  matrix  £ 

else 
£ = I 
end 

Compute  the  eigenvector  ew+i  associated  with  the 
smallest  eigenvalue  of  Re  = A£e 
p(z)  = [1,  z~2, z~N]eN+ 1 

Find  the  N roots  z\...zn  of p(z) 
for  i = 1 : N 
u>i  = angle(zj) 
end 


The  Principal  Components  Method 

The  principal  components  method  [22-24]  of  Kumaresan  and  Tufts  is  a signal 
subspace  method  that  uses  a reduced  rank  approximation  to  Ryy  to  solve  the  linear 
prediction  equations.  From  equation  (3.5),  the  rank  N matrix  that  is  closest  in  the 
2-norm  to  R is 

N 

R'PC  = EV«®?-  (3-14) 

i=l 

Note  here  that  the  signal  subspace  should  be  chosen  so  that  RPC  is  always  positive 
definite,  since  negative  eigenvalues  can  only  arise  due  to  the  error  in  Ryy , and  should 
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only  be  associated  with  the  noise  subspace.  The  reduced-rank  matrix  RPC  is  used  to 
solve  the  Yule- Walker  equation 


Since  the  solution  to  (3.15)  is  not  unique,  the  solution  is  chosen  that  minimizes  the 
length  of  a.  This  may  be  efficiently  determined  by  using  the  properties  of  the  singular 
value  decomposition  and  the  fact  that  the  SVD  of  RPC  is  given  by  (3.14),  since  RPC 
is  positive  semidefinite: 


closest  to  the  unit  circle  are  used  to  find  the  principal  components  estimates  of  the 
frequencies  ui.  It  has  been  shown  [23,  24]  that  the  use  of  a rank  reduced  matrix 


unit  circle,  with  the  remaining  roots  clustered  well  inside  the  unit  circle,  simplifying 
identification  of  the  desired  roots. 

In  comparison  with  MUSIC  and  ESPRIT,  less  analytical  work  has  been  per- 
formed on  the  statistical  properties  of  the  principal  components  method.  Simulation 
results  [22]  indicate  that  the  technique  has  a higher  threshold  signal-to-noise  ratio 
than  MUSIC,  but  otherwise  similar  performance,  and  an  analytical  comparison  be- 
tween the  two  techniques  [25]  yields  the  same  conclusion. 


RPCa  — r 


(3.15) 


(3.16) 


(where  RPC  is  the  pseudoinverse  of  RPC).  The  N roots  of 


1 


(3.17) 


a 


RPC  with  N < M causes  the  polynomial  (3.17)  to  have  N roots  very  close  to  the 
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Algorithm  3.2:  Principal  Components  Method 

Estimate  the  M x M signal  autocorrelation  matrix  R 
if  noise  is  colored 

Estimate  the  M x M noise  autocorrelation  matrix  £ 

else 
£ = I 
end 

if  number  of  signals  N is  unknown 
Compute  the  eigenvalues  of  Re  = A£e 

Estimate  the  number  of  complex  signals  N,  using  the  AIC  or  MDL 

end 

Compute  the  N largest  eigenvalues  and  the  associated  eigenvectors 
a = ~£ili  e^efr/A < 

p(z)  = l + EiiiZ-ia(i) 

Find  the  N roots  z\ . . . of  p{z)  closest  to  the  unit  circle 
for  i — 1 : N 
Ui  = angle(zj) 

end 


The  MUSIC  Algorithm 

MUSIC  (an  acronym  for  Multiple  Signal  Classification)  was  developed  by 
Schmidt  [10,  26]  in  the  late  1970’s.  For  uniformly  sampled  time  series,  MUSIC  is 
a straightforward  generalization  of  the  Pisarenko  harmonic  decomposition.  Rather 
than  a single  vector,  it  employs  a matrix  composed  of  several  vectors  from  the  noise 
subspace, 

= [®N+1>  • • • > ®Af] 


and  uses  (3.7)  or  (3.8)  to  find  the  MUSIC  estimates  of  the  t (The  root-finding 
variant  (3.8)  is  referred  to  as  root-MUSIC,  and  was  first  suggested  by  Barabell  [27].) 
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Since  there  are  generally  many  more  eigenvectors  in  the  noise  subspace  than 
in  the  signal  subspace,  it  is  important  to  note  that  use  of  the  identity 

N M 

X>fe<  + £ efei  = I (3.18) 

»=1  i=N+l 


allows  computation  of  whatever  set  of  eigenvectors  is  smaller,  and  may  greatly  reduce 
the  amount  of  computation  required  to  compute  noise  subspace  estimators: 


p(z)  = 


zhE 


N 


— Z 


H 


The  statistical  properties  of  MUSIC  have  been  widely  studied  [2,8,28-31].  It 
has  been  shown  [2]  that  MUSIC  is  asymptotically  efficient,  and  that  it  is  an  approx- 
imate maximum  likelihood  estimator  for  large  data  records.  Surprisingly,  it  has  also 
been  shown  that  MUSIC  can  in  some  circumstances  have  a lower  variance  than  the 
maximum  likelihood  estimator  [8],  which  is  possible  since  the  MLE  is  not  efficient 
for  a finite  number  of  samples.  Stoica  has  analyzed  the  performance  of  MUSIC  and 
ESPRIT  for  the  estimation  of  sinusoidal  parameters  [29],  with  ESPRIT  found  to  be 
slightly  superior  in  terms  of  minimizing  the  mean  squared  error  of  the  frequency  esti- 
mates. A companion  paper  [32],  which  analyzed  the  two  techniques  for  array  bearing 
estimation,  found  MUSIC  to  be  superior  for  that  application. 
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Algorithm  3.3:  Root-MUSIC 

Estimate  the  M x M signal  autocorrelation  matrix  R 
if  noise  is  colored 

Estimate  the  M x M noise  autocorrelation  matrix  S 
else 
± = I 
end 

if  number  of  signals  N is  unknown 
Compute  the  eigenvalues  of  Re  = AEe 

Estimate  the  number  of  complex  signals  N,  using  the  AIC  or  MDL 

end 

if  N < M/2 

Compute  the  eigenvectors  associated  with  the  N largest  eigenvalues 

Es  = [ei, . . . ,eN] 

U = I - E5E5 
else 

Compute  the  eigenvectors  associated  with  the  M — N smallest  eigenvalues 
Eat  = [e^+i,...,eM] 

U = EWE" 

end 

p(z)  = z"Uz,  where  z = [1,  z,z2,. . . , 2M_1]T 
Find  the  N roots  z\ . . . z^  of  p(z)  closest  to  the  unit  circle 
for  i = 1 : N 
uJi  = angle  (zj) 

end 


The  ESPRIT  Algorithm 

ESPRIT  (Estimation  of  Signal  Parameters  via  Rotational  Invariance  Tech- 
niques) was  developed  in  the  mid-1980’s  by  Roy  and  Kailath  [33-35].  As  its  name 
implies,  it  is  based  on  a different  principle  than  other  subspace  methods,  namely,  the 
invariance  of  the  signal  subspace  to  time  shifts.  We  have  shown  above  that  the  signal 
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subspace  is  spanned  by  the  N vectors  s^;  that  is,  the  columns  of  the  matrix 


1 

1 

1 

ejUl 

ej“ 2 

. . gjwjv 

S — [sj , s2, . . 

• > sAr]  — 

ej2ui 

. ej1UN 

ej{M- l)u>2  . 

. ei(M-l)uN 

form  a basis  for  the  signal  subspace.  Consider  the  two  submatrices  of  S composed  of 
its  first  M — 1 rows,  and  its  last  M — 1 rows, 


■ 

S, 

XXX 

XXX 

s2 

These  matrices  are  related  by  S2  = S^,  where 


e^1  0 ...  0 

0 eju 2 ...  0 


0 0 ...  ejUN 


(3.19) 


This  relationship  between  the  two  submatrices  is  a consequence  of  the  fact  that  each 
of  the  signals  is  invariant  under  a time  shift,  except  for  multiplication  by  a constant. 
The  choice  of  Sj  and  S2  given  above  selects  the  two  submatrices  that  have  maximum 
overlap;  although  this  is  the  most  common  choice,  and  appears  to  give  the  best 
performance  for  frequency  estimation,  many  other  choices  are  possible.  A generalized 
variant  of  ESPRIT  that  attempts  to  exploit  all  possible  invariances  in  the  context  of 
array  processing  is  described  in  [36], 
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The  set  of  signal  eigenvectors  that  are  the  columns  of  E spans  the  same  space  as 
the  columns  of  S,  so  the  two  matrices  must  be  related  by  a nonsingular  transformation 
E = [e1(  e2, . . . , e^]  = SW.  Partitioning  E in  the  same  manner  as  S and  using  (3.19), 


Ex 

XXX 

SjW 

1 

X 

X 

X 

i 

X 

X 

X 

1 

cs 

W 

XXX 

S2W 

This  means  that  the  submatrices  of  E are  also  related  by  a nonsingular  transformation 
E2  = S2W  = SjQW  - EjW_1QW,  or 

E,W_1QW  = EjX  = E2.  (3.20) 

Since  W must  be  nonsingular,  X and  Q are  similar,  and  therefore  the  eigenvalues  of 
X are  the  same  as  the  eigenvalues  of  Q.  The  arguments  of  the  eigenvalues  of  Q are 
the  original  frequencies,  so  if  X is  found,  the  frequencies  can  be  easily  determined. 

When  an  estimate  of  the  autocorrelation  matrix  is  used,  the  partitions  of  the 
matrix  of  signal  subspace  eigenvectors  will  not  exactly  satisfy  (3.20).  An  approximate 
solution  must  be  sought,  and  the  most  common  approach  is  to  solve  (3.20)  in  the  total 
least  squares  sense,  that  is,  to  find  an  XTLS  that  solves 

(Ej  + Fx ) XTL5  = (E2  + F2) 
so  that  the  Frobenius  norm  of  the  error  matrices 

IIFx  F2]||f 


is  minimized.  This  approach  is  referred  to  as  TLS-ESPRIT.  If  (3.20)  is  solved  in  the 
least  squares  sense,  the  method  is  sometimes  called  LS-ESPRIT;  the  TLS  approach 
yields  superior  estimates.  The  total  least  squares  problem  may  be  solved  using  the 
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singular  value  decomposition,  as  described  in  [37].  When  XTLS  has  been  found,  the 
ESPRIT  frequency  estimates  are  taken  to  be  the  arguments  (angles)  of  the  eigenvalues 
of  XTLS. 

The  statistical  properties  of  ESPRIT  have  been  heavily  analyzed  [29, 32-35, 
38],  It  has  been  shown  [34,35]  that  ESPRIT  is  an  approximate  maximum  likelihood 
estimator,  and  that  it  is  consistent  and  asymptotically  efficient.  For  estimation  of 
sinusoidal  frequencies,  an  analysis  by  Stoica  [29]  shown  that  ESPRIT  (referred  to  in 
that  work  as  SURE)  is  slightly  more  accurate  than  MUSIC  in  most  cases. 


Algorithm  3.4:  TLS-ESPRIT 

Estimate  the  M x M signal  autocorrelation  matrix  R 
if  noise  is  colored 

Estimate  the  M x M noise  autocorrelation  matrix  S 
else 
£ = I 
end 

if  number  of  signals  N is  unknown 
Compute  the  eigenvalues  of  Re  = ASe 

Estimate  the  number  of  complex  signals  N,  using  the  AIC  or  MDL 

end 

Compute  the  eigenvectors  associated  with  the  N largest  eigenvalues 
E = [ei, . . . ,ejy] 

Ei  = E(1  : M-  1,1  : N) 

E2  = E(2:M,1:  N) 

Solve  EiX  = E2  in  the  total  least  squares  sense 
Find  the  N eigenvalues  o\  . . . of  X 
for  i = 1 : N 
Cji  = angle  (<7j) 

end 


Computational  Considerations 

In  subsequent  chapters,  new  algorithms  for  Toeplitz  eigendecomposition  will  be 
developed  and  employed  to  reduce  the  computation  required  for  frequency  estimation 
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using  subspace  techniques.  To  provide  a context  for  evaluating  these  improvements, 
we  will  briefly  examine  the  computational  cost,  in  terms  of  operation  counts  and 
memory  requirements,  of  the  subspace  techniques  when  they  are  implemented  using 
conventional  methods.  All  of  the  computational  costs  given  will  be  for  the  real  signals 
case,  since  real  signals  are  assumed  in  the  remainder  of  this  work. 

We  can  see  immediately  that  all  of  the  conventional  implementations  require 
0(M 2)  memory  locations;  the  amount  of  computation  required  depends  on  the  par- 
ticular algorithm  used,  and  on  whether  the  noise  is  white  or  colored.  The  cost  of  each 
of  the  three  most  widely  used  algorithms  is  discussed  in  more  detail  below. 

Estimating  frequencies  with  a subspace  algorithm  requires  three  basic  compu- 
tational steps:  estimate  the  signal  and  noise  correlation  matrices,  compute  some  of 
the  eigenvectors  (an  invariant  subspace)  of  the  matrix  pencil  (R,  £),  and  process  this 
invariant  subspace  in  some  fashion  to  determine  the  frequencies.  If  there  are  L points 
in  the  input  data  record,  N sinusoidal  signals,  and  an  M x M autocorrelation  matrix 
is  used,  then  estimation  of  the  correlation  matrices  is  typically  O(LM),  computa- 
tion of  the  invariant  subspace  is  0(M3),  and  estimation  of  the  frequencies  from  the 
invariant  subspace  is  approximately  0(M2N)  for  the  principal  components  method 
and  MUSIC,  and  0(MN 2)  for  ESPRIT.  Since  M is  larger  than  N,  and  is  typically 
one-tenth  to  one-half  the  size  of  L,  the  computational  cost  of  conventional  methods 
is  dominated  by  the  computation  of  the  invariant  subspace. 

The  cost  of  invariant  subspace  computation  varies  depending  on  whether  the 
noise  is  white  or  colored.  In  mathematical  terms,  the  white  noise  problem  requires 
computation  of  an  invariant  subspace  of  a symmetric  matrix,  and  the  colored  noise 
problem  requires  computing  an  invariant  subspace  of  a definite  matrix  pencil. 
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White  Noise 

When  conventional  techniques  for  symmetric  eigenproblem  (Householder  re- 
duction to  tridiagonal  form,  followed  by  QR  or  QL  iteration)  are  used,  approximately 
4M3/3  floating-point  operations  are  required  to  compute  the  eigenvalues  for  the  white 
noise  case.  Computing  both  the  eigenvalues  and  the  eigenvectors  by  the  same  method 
requires  approximately  9 M3  floating-point  operations,  because  each  of  the  orthogo- 
nal transformations  used  to  reduce  R to  tridiagonal  form  must  be  accumulated  for 
use  in  computing  the  eigenvectors.  If  fewer  than  23  eigenvectors  are  required,  it  is 
more  economical  to  compute  only  the  eigenvalues  with  the  QL  algorithm,  since  the 
eigenvectors  may  then  be  computed  by  inverse  iteration  at  a cost  of  M3/ 3 operations 
each. 

Inverse  “iteration”  with  accurate  eigenvalues  typically  requires  only  one  solu- 
tion of  the  equation  (R  — AiI)xi  = z,  with  z initialized  to  a random  vector,  to  produce 
an  accurate  eigenvector  x{.  Inverse  iteration  will  fail  if  z is  orthogonal  to  xt,  but  this 
is  extremely  unlikely  for  random  z;  difficulties  may  also  arise  when  R has  repeated 
eigenvalues,  because  inverse  iteration  will  find  only  one  of  the  eigenvectors  associated 
with  the  repeated  values.  When  R is  a matrix  calculated  from  noisy  data,  however, 
repeated  eigenvalues  are  also  extremely  unlikely.  In  most  cases,  then,  inverse  iter- 
ation is  a viable  alternative  to  accumulating  the  transformations  in  the  QR  or  QL 
algorithms,  as  long  as  the  dimension  of  the  required  subspace  is  small. 

Colored  Noise 

In  the  colored  noise  case,  several  computational  approaches  are  possible.  We 
have  already  described  one  technique,  which  computes  the  eigenvalues  of  S_1R;  it 
is  rarely  used  because  other  methods  have  better  numerical  properties  and  equal 
computational  cost.  One  common  approach  is  to  compute  the  Cholesky  factorization 
£ = GTG,  where  G is  upper  triangular,  and  then  compute  the  eigenvectors  and 
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eigenvalues  of  G TRG-1.  This  transforms  the  generalized  eigenvalue  problem  into  a 
standard  eigenvalue  problem,  where  the  multiplication  by  G~r  and  G-1  may  also  be 
viewed  as  a whitening  transformation: 

G-rRG-1y  = AG~r£G_1y 

= AGrGTGG1y 
= Ay 

The  eigenvalues  of  G~TRG-1y  = Ay  are  the  same  as  those  of  the  original  problem 
Rx  = ASx,  and  the  eigenvectors  of  the  original  problem  are  given  by  x = G_1y.  The 
computational  cost  is  therefore  the  cost  of  solving  the  standard  eigenproblem,  plus  an 
additional  M3/ 3 operations  for  the  Cholesky  decomposition,  M3/ 3 for  inverting  G, 
2 M3  for  computing  G“TRG_1,  and  M3/ 3 per  eigenvector  for  multiplying  by  G_1, 
for  a total  of  (N  + 5)M3/3  additional  operations. 

The  common  denominator  of  all  the  subspace  estimators  described  above  is 
the  requirement  to  compute  the  eigendecomposition  of  the  estimated  autocorrelation 
matrix.  When  conventional  techniques  are  used,  the  computational  cost  of  this  com- 
putation is  proportional  to  M3  for  a M x M matrix;  this  is  true  even  when,  as  is 
often  the  case  with  the  subspace  methods,  only  a small  set  of  eigenvectors  must  be 
computed.  In  Chapter  6,  we  will  examine  methods  that  compute  any  desired  set  of 
N eigenvalues  of  a Toeplitz  matrix  and  their  associated  eigenvectors,  at  a cost  which 
is  proportional  to  NM2  or  NM  log2  M,  depending  on  the  choice  of  technique. 

The  price  required  for  this  dramatic  reduction  in  computation  is  a restriction 
on  the  form  of  the  autocorrelation  matrix  estimators  which  may  be  used.  To  determine 
the  effect  of  this  restriction  on  the  performance  of  the  subspace  techniques,  a detailed 
analysis  of  the  accuracy  of  several  widely  used  estimators,  and  the  effect  of  errors 
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in  estimating  the  autocorrelation  on  the  invariant  subspaces  of  the  autocorrelation 
matrix,  is  presented  in  the  next  chapter. 


CHAPTER  4 

AUTOCORRELATION  ESTIMATES  FOR  SUBSPACE  METHODS 


In  the  preceding  chapter,  existence  of  an  estimate  of  the  autocorrelation  matrix 
of  the  input  signal  was  assumed;  in  this  chapter,  we  will  examine  some  common 
techniques  for  computing  this  estimate.  Because  the  deterministic  signal  model  is 
nonstationary  and  nonergodic,  the  issue  of  autocorrelation  matrix  estimation  is  more 
complex  than  it  is  for  the  usual  case  of  stationary  random  processes.  Since  the  fast 
eigendecomposition  techniques  that  are  developed  in  later  chapters  require  a Toeplitz 
matrix,  we  will  be  particularly  interested  in  the  effects  of  using  Toeplitz  estimates  of 
the  autocorrelation  matrix. 

The  first  important  issue  in  estimating  the  autocorrelation  of  deterministic 
sinusoids  is  a consequence  of  the  nonstationarity  of  the  signals.  From  this  point 
onward,  we  will  return  to  the  use  of  real,  rather  than  complex,  signals.  Under  the 
deterministic  signal  model  discussed  in  Chapter  2,  the  data  y[k]  are  assumed  to  be  of 
the  form 

y[k]  = x[fc]  + n[k], 


where 

N 

x[k ] = J2aicos(uik  + (t>i), 

i=l 


and  the  noise  n[k]  is  Gaussian,  stationary  and  uncorrelated  with  x[lfc],  with  zero  mean 
and  covariance  matrix  X.  The  output  of  this  model  is  not  wide-sense  stationary,  and 
therefore  it  is  also  not  ergodic  in  the  wide  sense,  that  is,  the  statistical  expected  value 
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of  y[k]  y[k  + n]  is  not  equal  to  its  time  average  over  an  infinite  interval.  Since  the 
process  is  nonstationary,  its  statistical  autocorrelation  matrix  is  not  Toeplitz. 

The  derivation  of  the  subspace  methods  given  in  Chapter  3 reflects  this  fact; 
the  subspace  methods  were  derived  using  the  autocorrelation  of  a deterministic  signal. 
To  use  a Toeplitz  matrix  in  subspace  estimation  techniques,  it  is  first  necessary  to 
show  that  a Toeplitz  matrix  exists  that  yields  exact  frequency  estimates;  if  such  a 
matrix  does  not  exist,  then  fast  subspace  methods  based  on  Toeplitz  estimates  can 
never  equal  the  performance  of  the  original  (covariance-based)  subspace  methods. 

One  common  approach  is  to  simply  redefine  the  signal  model  so  that  the  signal 
is  stationary.  If  4>  is  taken  to  be  a random  variable  uniformly  distributed  between 
0 and  27T,  the  resulting  signal  is  stationary  (but  not  ergodic  or  Gaussian).  Under 
this  signal  model,  which  we  will  refer  to  as  the  semi-deterministic  model,  the  auto- 
correlation matrix  of  a signal  composed  of  N sinusoidal  signals  in  additive  noise  has 
elements  given  by 

1 N 

Rsd&J)  = « cos^li-  j|)  + E(t,j). 

z i= i 

The  use  of  a model  with  random  phase  implies  that  the  frequency  estimation  is  per- 
formed without  taking  advantage  of  any  phase  information  that  may  become  available. 
Since  the  Cramer-Rao  bounds  for  frequency  estimation  are  asymptotically  decoupled 
from  the  phase,  as  shown  in  Chapter  2,  it  is  possible  for  a Toeplitz  estimator  based 
on  this  model  to  have  nearly  equal  performance  to  the  covariance  estimator  for  long 
data  records. 

By  examining  the  behavior  of  the  conventional  methods  as  the  number  of 
samples  becomes  large,  we  can  see  how  this  may  be  achieved.  In  the  following  sections, 
we  will  show  that  when  the  input  consists  of  deterministic  sinusoids  in  stationary, 
additive  Gaussian  noise,  all  the  estimation  methods  considered  here,  including  the 
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covariance  method  on  which  the  conventional  subspace  algorithms  are  based,  converge 
to  the  same  Toeplitz  matrix  as  the  number  of  data  points  L goes  to  infinity.  To 
distinguish  this  time-averaged  “autocorrelation  matrix,”  which  is  derived  from  a single 
time  series,  from  the  (non-Toeplitz)  statistical  autocorrelation  matrix  obtained  from 
the  entire  ensemble  of  possible  time  series,  we  will  refer  to  the  autocorrelation  obtained 
by  time  averaging  as  the  temporal  autocorrelation,  and  denote  it  by  R.  It  will  also 
be  shown  in  the  following  section  that  the  temporal  autocorrelation  is  equal  to  the 
statistical  autocorrelation  matrix  R5£)  of  the  semi-deterministic  signal  model  with 
the  same  amplitude  and  frequency  parameters  as  the  deterministic  model. 

Both  MUSIC  and  ESPRIT  have  been  shown  to  be  asymptotically  efficient 
[2,38],  that  is,  as  the  number  of  samples  becomes  large,  the  estimation  error  of  the 
techniques  approaches  the  Cramer-Rao  bound.  As  the  number  of  samples  becomes 
large,  the  covariance  estimate  of  the  autocorrelation  approaches  a Toeplitz  matrix, 
and  the  error  in  the  frequency  estimate  approaches  the  minimum  possible  error.  If  it 
is  possible  to  find  a Toeplitz  autocorrelation  estimate  that  is  a good  approximation 
to  the  temporal  autocorrelation,  it  is  then  possible  to  use  this  Toeplitz  matrix  in  the 
subspace  methods  without  a significant  loss  of  accuracy. 

The  Temporal  Autocorrelation  Matrix 

Before  discussing  the  various  methods  of  estimating  the  correlation  matrix, 
it  will  be  useful  to  derive  the  exact  temporal  autocorrelation  matrix  for  sinusoidal 
signals  in  noise.  If  the  exact  temporal  autocorrelation  R is  used  in  an  asymptotically 
efficient  subspace  estimator,  an  exact  frequency  estimate  results.  Because  of  this,  R 
can  serve  as  a standard  for  evaluating  the  accuracy  of  the  various  estimators,  and 
allow  us  to  bound  the  error  in  the  subspace  estimates  derived  from  an  estimated 
autocorrelation  matrix. 
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Using  the  deterministic  signal  model,  the  elements  of  the  exact  temporal  au- 
tocorrelation matrix  are  given  by 


1 


L-l 


R(i,  j)  = Hin  | £ E #[*  + %[j  + *]  | • 


(4.1) 


By  expanding  y[k]  as  the  sum  of  the  signal  and  the  noise,  we  can  write  this  as 


R(i,  j ) = Jirn  | i ^53  ®[*  + k]x\j  + k]  (4.2) 

L—l 

+ 53  nV  + k]x[j  + A:] 

k=0 
L—l 

+ 53  + k]n[j  -I-  k] 

k—0 
L—l 

+ 53  nV  + k]n[j  + A:] 

k= 0 

Because  x[k]  and  n[A:]  are  uncorrelated  and  zero-mean,  the  limits  of  the  two  cross 
terms  involving  x[k]  and  n[k]  are  zero,  and  the  elements  of  the  exact  autocorrelation 
matrix  are 


R(i,  j ) = lim  ( y 53  + klx\j  + k]  + n[i  + k]n[j  + A;] } 

L^°°  l L fc=o  J 

= 1'™ o { \ 53  *[*  + k\x\j  + k]  J + S(i,  j ) 

(because  the  noise  £ is  stationary).  Since  £( i,j ) is  assumed  to  be  known,  R may  be 
determined  by  finding  the  limit  of  the  sum 
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L-l 


53  x[*  + k]x[j  + k] 

L k=0 

2 L—l  N N 

= 7 53  13  13  aman  cos (wm(*  + A;)  + J cos(u;n(j  + A:)  + <f>n) 

^ k= 0 m= 1 n=l 
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2 TV  JV  L- 1 

= 7 E E aman  E C0S(^m(*  + *0  + 0m)  C0SK(j  + *)  + <£n) 

m=l  n=l  fc=0 


as  L goes  to  infinity.  The  critical  element  here  is  the  inner  sum  over  k.  To  simplify 
this  sum,  we  will  use  the  fact  that,  for  0 < a < 2n, 


lim 

L-y  oo 


1 

L 


L- 1 

y cos(ak  + /3) 


k= 0 


|cos(/?),  a = 0 

0,  a^O 


By  bringing  the  limit  and  the  coefficient  1/L  inside  the  first  two  summations,  and 
expanding  the  product  of  cosines  as  a sum  of  cosines,  we  find  that  the  limit  of  the 
inner  summation  is  given  by 


r i L~l 
limUV 
L^°°  l L k= o 


cos(o;m(i  + k)  + <f>m)  cos(u >n(j  + k)  + <f>J 


f 1 L_1 

= | 2l  E C0S(W«(<  + *)  “ + *)  + (0m  - <£„)) 

1 L~l  I 

+ 2L  E C0S(^m(*  + *)  + WB(j  + A)  + (<£m  + <£„))  | 

= & { ^ E C0SKn*  - Wj  + Kn  - "„)*  + (0m  - O | 

I cos(o;m(i  - j)  + (0m  - 0J),  U)m  = UJn 

°»  Um  + “n 


Since  we  have  assumed  that  the  sinusoids  in  the  signal  are  distinct,  if  u ;m  = un,  then 
m must  be  equal  to  n,  and  so  <j)m  = (f)n  also.  The  exact  temporal  autocorrelation  of 
the  model  signal  is  therefore  given  by 

1 N 

R(hj)  = 7,  E an  cos(o;n|i  - j|)  + E(t,  j). 

1 n=l 


(4.3) 
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The  temporal  autocorrelation  is  therefore  a Toeplitz  matrix,  despite  the  fact  that 
the  underlying  signals  are  nonstationary;  as  mentioned  above,  it  is  the  same  as  the 
autocorrelation  of  the  stationary  semi-deterministic  signal  model. 

Autocorrelation  Matrix  Estimates 

There  are  several  techniques  for  estimating  the  autocorrelation  matrix  from 
a data  record;  three  of  the  most  widely  used  methods  are  described  below:  the  co- 
variance  method,  which  produces  a symmetric  positive  definite  estimate  Rc  of  the 
temporal  autocorrelation  R,  the  biased  Toeplitz  estimate,  which  produces  a symmet- 
ric positive  definite  Toeplitz  estimate  Rs,  and  the  unbiased  Toeplitz  estimate,  which 
produces  a symmetric  Toeplitz  estimate  R^.  Although  the  Toeplitz  estimators  are 
described  as  “biased”  or  “unbiased,”  these  descriptions  apply  only  when  they  are 
used  to  estimate  the  autocorrelation  of  a stationary  Gaussian  signal.  When  the  sig- 
nal consists  of  deterministic  sinusoids  in  noise,  all  of  these  estimators,  including  the 
covariance  method,  are  biased  estimates  of  both  the  temporal  autocorrelation  R and 
the  statistical  autocorrelation  R,  for  finite  data  records. 

In  the  following  sections,  a brief  description  of  each  of  these  estimators  will  be 
presented.  The  terminology  established  in  previous  chapters  will  be  retained;  thus, 
the  number  of  samples  in  the  data  record  y[k]  will  be  denoted  by  L,  the  dimension 
of  the  autocorrelation  estimate  by  M,  and  the  actual  number  of  (complex)  sinusoidal 
signals  by  A'  = 2 NR. 

The  Covariance  Estimate 

The  most  widely  used  method  of  autocorrelation  matrix  estimation  for  sub- 
space techniques  is  the  so-called  “covariance”  method.  Despite  its  name,  the  covari- 
ance method  produces  an  estimation  of  the  autocorrelation,  not  the  covariance.  The 
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elements  of  the  covariance  estimate  Rc  are  given  by 

= L-M+l  ^ y[i  + k\y[j  + k\-  (4.4) 

A convenient  representation  of  Rc  is  as  the  product  of  the  L - M + 1 x M Toeplitz 
data  matrix 


y[M  - 1] 

v\M  - 2] 

y[M  — 3]  . 

y[  °] 

y[M\ 

y[M  - 1] 

y[M  - 2] 

y[i] 

y[M  + 1] 

y\M] 

y[M  - 1] 

y[  2] 

(4.5) 

y[L  - 1] 

y[L  - 2] 

y[L  - 3]  . 

1 

1 

and  its  transpose,  that  is, 

Rc  = YrY.  (4.6) 

From  this  representation,  it  is  clear  that  Rc  is  positive  semidefinite. 

By  comparing  the  definition  of  Rc  in  equation  (4.4)  with  the  temporal  auto- 
correlation R of  (4.1),  we  can  see  that  for  any  finite  M,  the  limit  of  Rc  as  L goes  to 
infinity  is  R.  Expanding  Rc  as 

l ( L-\i-j\ 

Rc(hj)  = M+1  E x[i  + k]x\j  + k] 

L-\i~j\ 

+ e n[*  + k\X[j  + k] 

k=0 

L-\i-j\ 

+ E + k]n[j  + k] 

k= 0 

L-\i-i\  'j 

+ E n[*  + k)n[j  + k]  f , 

k—0  ; 
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and  comparing  each  term  with  the  corresponding  term  of  equation  (4.2),  we  can  also 
see  that  there  are  two  distinct  sources  of  error  in  the  covariance  estimate.  First,  there 
is  an  error  that  results  from  the  finite  sample  length,  because  the  first  summation 
term  of  the  estimate  is  not  equal  to  the  temporal  autocorrelation  of  the  noise-free 
signal  x[k\. 


1 

L-M+l 


T,  x[i  + k]x[j  + k] 

k= 0 


N N 


L-M+l  ^ am°n 


L-\i-j\ 

E c°s ( i + k)  + </>m)  cos(a Jn(j  + k)  + <j>n) 

k= 0 


\ COSK|i-  j\). 

Z n= 1 


This  error  occurs  even  when  there  is  no  noise;  it  is  due  solely  to  the  use  of  a finite 
sample  in  calculating  Rc. 

The  remaining  three  terms  in  the  expansion  are  nonzero  only  in  the  presence 
of  noise.  The  two  cross  terms  are  weighted  sums  of  zero-mean  Gaussian  random 
variables,  and  so  are  normally  distributed  with  a variance  that  decreases  linearly 
with  the  number  of  samples.  If  S = I,  the  mean  of  the  cross  terms  is  zero,  but  this  is 
not  necessarily  the  case  for  colored  noise.  The  estimated  noise  autocorrelation  S is 
a matrix  random  variable  that  is  Wishart  distributed  with  expected  value  £ [39,  p. 
249].  The  magnitude  of  the  error  in  Rc  due  to  these  three  terms  is  dependent  on 
both  the  number  of  samples  and  the  magnitude  of  £. 

The  Biased  Toeplitz  Estimate 

The  biased  Toeplitz  estimate  is  defined  by  the  equation 

i L-\i-j\-l 

= t E y[k)ylk  + 1*  -il]- 

L k= 0 


(4.7) 
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Like  the  covariance  estimate,  the  biased  Toeplitz  estimate  may  be  written  as  a product 
of  a Toeplitz  matrix  with  its  transpose,  Rs  = YTY,  where 


y[o] 

0 

0 

0 

y[i] 

2/[0] 

0 

0 

y[  2] 

y[i] 

y[o] 

0 

y[M  - 1] 

y[M  - 2] 

y[M  - 3] 

y[o] 

y[L  - 1] 

<£2 

1 

to 

y[L  - 3] 

y[L  - M ] 

0 

y[L  - 1] 

y[L  - 2] 

y[L-M  + 1] 

0 

0 

y[L  - 1] 

y[L  - M + 2] 

0 

0 

0 

y[L-M  + 3] 

0 

0 

0 

y[L-  1] 

The  biased  Toeplitz  estimate  is  closely  related  to  the  covariance  estimate,  because 


L 


Y = 


Y , 


U 


where  L and  U are  M x M lower  and  upper  triangular  matrices,  respectively.  Be- 
cause the  biased  Toeplitz  estimate  is  the  same  as  as  the  covariance  estimate  of  a data 
sequence  that  has  been  extended  with  M zeros  at  the  start  and  end,  it  is  also  some- 
times called  the  pre-  and  post-windowed  estimate.  Since  it  is  an  outer  product,  it  is 
always  positive  semidefinite.  As  the  number  of  samples  increases,  RB  approaches  Rc, 
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because  the  contributions  of  L and  U to  the  product  Rs  = YTY  become  negligible; 
its  limit  as  L goes  to  infinity  is  therefore  R,  the  same  as  the  covariance  estimate. 

The  Unbiased  Toeplitz  Estimate 

The  unbiased  Toeplitz  estimate  is  identical  to  the  biased  Toeplitz  estimate, 
except  that  the  scaling  of  the  estimate  as  a function  of  the  lag  | i — j\  has  been 
modified  so  that,  for  a stationary  Gaussian  process,  the  estimate  is  unbiased.  The 
entries  of  the  autocorrelation  matrix  estimate  are  given  by 

„ i L—\i—j\—l 

Ru(*J)  = V ■ ■ _ -|  E y[k]y[k  + \i  - j\].  (4.8) 

For  a fixed  M,  the  unbiased  estimate  also  converges  to  the  temporal  autocorrelation 
as  L goes  to  infinity,  because  the  effect  of  the  change  in  scaling  approaches  zero. 
Unlike  the  previous  two  estimators,  however,  the  unbiased  Toeplitz  estimate  is  not 
necessarily  positive  definite  for  finite  data  records. 

Comparison  of  Autocorrelation  Estimators 

In  the  sections  above,  we  have  shown  that  the  subspace  methods  with  a 
Toeplitz  estimate  of  the  autocorrelation  matrix  have  the  same  asymptotic  perfor- 
mance as  the  versions  that  use  a covariance  estimate,  because  each  of  these  estimates 
approaches  the  same  limit  as  the  data  record  length  L goes  to  infinity.  Still  left  unan- 
swered is  the  question  of  how  well  a subspace  method  will  perform  using  Toeplitz 
estimates  derived  from  finite  data  records.  The  first  step  in  answering  this  question  is 
to  compare  the  performance  of  the  autocorrelation  estimators  as  a function  of  matrix 
dimension,  data  record  length,  and  signal-to-noise  ratio. 

To  provide  a common  test  case  for  all  three  estimators,  a set  of  1000  data 
records  was  generated;  each  data  record  contained  samples  of  a process  made  up 
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of  2 sinusoidal  signals  (fll  = 1,  uq  = 1.88496,  4>x  = 0.3,  a2  = 1,  w2  = 2.01062, 
<t> 2 = -0.4),  in  additive  white  Gaussian  noise  of  variance  cr2  = 0.01.  For  each  record, 
autocorrelation  matrix  estimates  of  dimensions  5 < M < 195  were  generated  using 
each  of  the  three  techniques  described  above.  The  difference  (in  the  2-norm)  between 
each  estimate  and  the  exact  temporal  autocorrelation,  ||R-R||2,  was  then  calculated 
for  each  of  the  estimates. 

The  first  set  of  estimates  was  for  a data  record  length  of  L = 200.  The  results 
are  shown  in  Figures  4.1  and  4.2;  the  unbiased  Toeplitz  estimator  R^  is  consistently 
the  most  accurate  of  the  three  estimators,  and  the  biased  Toeplitz  estimator  Rc 
is  more  accurate  than  the  covariance  estimator  Rc  when  the  size  of  the  estimated 
autocorrelation  matrix  is  small.  The  covariance  estimate  approaches  the  accuracy  of 
the  unbiased  Toeplitz  estimate  for  intermediate  values  of  M,  before  the  effects  of  the 
small  data  record  length  reduce  its  accuracy  again  for  M « L. 

When  the  size  of  the  autocorrelation  matrix  approaches  the  number  of  points 
in  the  data  record  (in  this  case,  200),  estimation  of  the  autocorrelation  at  large  lags  is 
performed  using  only  a small  number  of  samples.  This  causes  large  errors  in  matrix 
elements  far  from  the  main  diagonal.  The  covariance  estimator  is  particularly  sensitive 
to  this  effect,  as  indicated  by  the  dramatic  increase  in  error  as  M approaches  200. 
The  biased  Toeplitz  estimator  suffers  much  less  from  this  problem,  and  the  unbiased 
Toeplitz  estimator  shows  only  a slight  increase  for  M = 195,  the  largest  matrix 
considered.  The  increasing  error  in  the  biased  Toeplitz  estimator  for  large  M is  due 
to  bias  in  its  estimates  of  large  lags,  not  to  an  increase  in  their  variance. 

When  a larger  number  of  samples  are  available,  the  differences  between  the 
three  estimators  are  less  pronounced.  Figures  4.3  and  4.4  show  the  accuracy  of  the 
estimators  for  the  same  signals  and  noise  power,  but  for  a data  record  length  of 
L = 2000.  Again,  the  unbiased  estimator  is  uniformly  more  accurate  over  the  entire 
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Figure  4.1:  Mean  Absolute  Error  ||R  — R||2  for  the  Covariance  (solid  line),  Biased 
(dotted  line),  and  Unbiased  (dashed  line)  Estimators  for  L = 200 
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Figure  4.2:  Mean  Relative  Error  ||R  — R||2/||R||2  for  the  Covariance  (solid  line), 
Biased  (dotted  line),  and  Unbiased  (dashed  line)  Estimators  for  L — 200 
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Figure  4.3:  Mean  Absolute  Error  ||R—  R||2  for  the  Covariance  (solid  line),  Biased 
(dotted  line),  and  Unbiased  (dashed  line)  Estimators  for  L = 2000 
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range  of  M.  As  the  theoretical  analysis  in  an  earlier  section  predicted,  each  of  the 
estimators  has  become  more  accurate  as  more  data  are  used  in  its  computation. 

This  comparison  has  been  presented  for  a single  example,  but  similar  numerical 
experiments  have  been  conducted  for  a wide  range  of  signals,  and  the  conclusions 
presented  here  remain  valid.  These  experiments  indicate  that  the  unbiased  Toeplitz 
estimator  is  the  most  accurate  of  the  three  methods  considered,  in  terms  of  the  error 
in  the  2-norm,  when  estimating  autocorrelation  matrices  of  signals  consistent  with 
the  deterministic  signal  model. 

Although  the  unbiased  Toeplitz  estimator  is  a superior  estimator  of  the  au- 
tocorrelation matrix,  the  subspace  methods  require  estimates  of  invariant  subspaces, 
and  it  is  not  necessarily  true  that  a more  accurate  estimate  of  the  autocorrelation 
matrix  produces  a more  accurate  estimate  of  its  invariant  subspaces.  In  the  following 
section,  we  will  examine  the  relation  between  the  accuracy  of  a matrix  estimate  and 
the  accuracy  of  subspace  estimates. 

Effects  of  Errors  in  Autocorrelation  Estimation 

We  have  seen  that  the  various  estimators  of  the  autocorrelation  matrix  have 
differing  error  magnitudes  under  different  conditions.  In  order  to  assess  the  impact 
of  these  errors,  it  is  necessary  to  determine  the  effect  of  a perturbation  of  the  auto- 
correlation matrix  on  its  invariant  subspaces.  Since  the  temporal  autocorrelation  of 
the  deterministic  signal  model  is  Toeplitz,  and  since  the  subspace  techniques  produce 
exact  frequency  estimates  when  the  temporal  autocorrelation  is  used  as  input,  an 
estimate  that  produces  a Toeplitz  matrix  close  to  the  temporal  autocorrelation  will 
result  in  good  frequency  estimates. 

It  is  possible  to  bound  the  effects  of  errors  in  estimating  the  autocorrelation  on 
the  accuracy  of  the  estimated  subspaces;  for  this  purpose,  a distance  measure  between 
subspaces  is  necessary.  It  is  important  to  recognize  that  we  do  not  want  to  measure 
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Figure  4.4:  Mean  Relative  Error  ||R  - R||2/||R||2  for  the  Covariance  (solid  line), 
Biased  (dotted  line),  and  Unbiased  (dashed  line)  Estimators  for  L = 2000 
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the  distance  between  matrices  that  span  the  subspaces.  For  example,  the  matrices 
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0 0 

0 0 
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all  span  the  same  subspace,  even  though  the  matrix  norm  of  the  difference  between 
any  two  of  them  is  nonzero.  A proper  measure  of  the  distance  between  two  l- 
dimensional  subspaces  of  n-dimensional  space  is  provided  by  the  k canonical  angles 
7r/2  > ©1  > ©2  > ...©*  > 0 between  the  subspaces,  where  k = / if  l < n/ 2,  and 
k — n — l if  l > n/2  [40,41].  Often,  the  n x n diagonal  matrix  of  canonical  angles 
© = diag(01, 02, . . . , Qk,  0, . . . , 0)  is  used  to  represent  the  difference  between  two 
subspaces.  The  canonical  angles  are  zero  if  and  only  if  the  two  subspaces  are  identi- 
cal, so  they  can  be  used  to  identify  identical  subspaces.  In  addition,  the  sines  of  the 
canonical  angles  may  be  used  as  distance  measures  between  subspaces.  In  particular, 
sin©1(A',T),  the  sine  of  the  largest  canonical  angle  between  two  subspaces  X and 
y , possesses  all  the  properties  required  of  a distance:  sin01(A',T)  = 0 if  and  only 
if  X = y,  sinO^AjT)  = sin01(T,<T),  and  sin©^^,^)  = D1,  sinQ^T,  2)  — D2 
implies  that  sin©1(A',^)  < Dl  + D2  (the  triangle  inequality). 

With  a distance  measure  selected,  we  may  proceed  to  examine  the  effects  of 
errors  in  the  autocorrelation  matrix  estimate  on  the  invariant  subspaces  of  the  matrix. 
The  following  two  theorems,  known  as  the  Sin  © and  Sin  2©  theorems,  are  due  to 
Davis  and  Kahan  [40].  The  forms  of  the  two  theorems  given  here  have  been  translated 
into  the  terminology  used  in  Chapter  3,  and  specialized  to  deal  only  with  the  norm 
of  the  perturbation.  The  original  work  contains  far  more  than  the  small  fragment 
presented  here;  more  complete  forms  of  these  theorems  will  be  presented  in  Chap- 
ter 6,  where  they  will  serve  as  the  basis  for  error  detection  in  an  eigendecomposition 
algorithm. 
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Theorem  4.1  (Sin  © Theorem  - Perturbation  form)  Let  R be  the  M x M 
temporal  autocorrelation  matrix  of  a real  signal  composed  of  NR  = N/2  deterministic 
sinusoids  of  powers  Pk  > P2  > . . . > PNr  > 0,  in  additive  white  Gaussian  noise  of 
variance  a2.  Since  R must  be  positive  semidefinite,  its  eigendecomposition  may  be 
expressed  as 


R — [E^,  E^r]  diag(A1,  A2,  • • • , XN,  o , . . . ,o  )[E5,  Ew], 

where  the  columns  of  the  M x N matrix  Es  span  the  exact  signal  subspace  S,  and  the 
columns  of  the  M x M — N matrix  E N span  the  exact  noise  subspace  J\f.  (Because 
R is  the  exact  autocorrelation,  A 2k  l = X2k  = Pk  + a2  for  1 < k < NR.) 

In  addition,  let  R = R + H be  a symmetric  approximation  to  the  exact  auto- 
correlation matrix,  and  express  the  eigendecomposition  of  R by 

= [E5)  Ejy]  diag(A!, . . . , XN,  XN+l, . . . , AM)[E5,  E^], 


where  Xl  > A2  > . . . > XM,  the  N columns  o/Es  span  the  approximate  signal  subspace 
S,  and  the  N — M columns  of  EN  span  the  approximate  noise  subspace  M . 

Then: 

||  sin  0(S,  S)  ||2  = sin  0,  (5.  S)  < J&-, 

A N~° 

and 

||  sin  Q(J\f , A/") 1 1 2 = sin  01(A/r ,ff)  < — • 

PNR  + a ~ XN+ 1 

The  Sin  © theorem  illustrates  an  important  property  of  invariant  subspaces: 
the  effect  of  a perturbation,  in  this  case  H,  depends  not  only  on  the  magnitude 
of  the  perturbation,  but  also  on  the  amount  of  separation  between  the  eigenvalues 
associated  with  the  exact  invariant  subspace  and  the  other  eigenvalues  of  the  matrix. 
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An  intuitive  insight  into  this  connection  can  be  developed  by  considering  the  problem 
in  signal  processing  terms.  If  the  power  in  the  weakest  signal  is  much  larger 
than  the  power  in  the  noise  a2,  then  we  would  expect  the  performance  of  a frequency 
estimation  technique  to  be  relatively  insensitive  to  small  errors  in  estimating  the 
autocorrelation.  On  the  other  hand,  if  the  power  in  the  smallest  signal  is  comparable 
to  the  noise  power,  a small  error  in  estimating  the  autocorrelation  can  result  in  the 
smallest  signal  being  mistaken  for  noise,  and  a component  of  the  noise  being  taken 
for  a signal. 

The  Sin  0 theorem  exhibits  these  effects  in  perturbation  bounds  that  are  ratios 
of  the  size  of  the  error,  ||H||2,  to  the  separation  between  the  eigenvalues  associated 
with  one  of  the  exact  subspaces  and  those  associated  with  the  complementary  esti- 
mated subspace.  It  is  reasonable  to  assume  that,  if  the  error  is  small,  the  difference 
between  the  eigenvalues  of  the  exact  and  estimated  subspaces  will  be  small  also.  This 
assumption  is  confirmed  by  the  following  theorem  [41,  p.  203]: 

Theorem  4.2  Let  R,  R,  and  H and  their  eigenvalues  be  as  defined  in  Theorem  4-1- 
Then,  for  any  unitarily  invariant  norm, 

||  diag(Ax, . . . , AM)  - diag(A1, . . . , AM)||  < ||H||. 

In  particular,  for  the  2-norm, 

||  diag(Ax, . . . , AM)  - diag(A1, . . . , AM)||2  = max  |A<  - AJ  < ||H||2. 

From  this  theorem  and  the  Sin  0 theorem,  we  can  see  that  the  sensitivity 
of  an  invariant  subspace  of  a symmetric  matrix  to  perturbations  is  determined  by 
the  smallest  distance  between  any  eigenvalue  associated  with  that  subspace  and  any 
eigenvalue  associated  with  the  complementary  subspace.  In  the  context  of  subspace 
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signal  processing  methods,  this  means  that  the  accuracy  to  which  the  signal  or  noise 
subspace  may  be  determined  from  an  estimate  of  the  autocorrelation  matrix  depends 
both  on  the  accuracy  of  the  autocorrelation  estimate  and  on  the  signal-to-noise  ratio 
of  the  weakest  signal. 

The  Sin  © theorem  assumes  that  the  eigenvalues  of  both  the  exact  temporal 
autocorrelation  matrix  R and  the  estimate  R are  known.  In  many  cases,  only  the 
eigenvalues  of  R are  available.  In  this  case,  a somewhat  weaker  bound  is  provided  by 
the  Sin  2©  theorem: 

Theorem  4.3  (Sin  2©  Theorem  - Perturbation  Form)  Let  the  matrices  R, 
R,  and  H,  their  eigendecompositions,  and  their  invariant  subspaces  be  as  defined  in 
Theorem  4-1-  Then 


sin  2©(<S,  «S)||2 


sin  201(<S,  S)  < 


_ 2||H||2 

~ ^ N+l 


and 

||  sin  2®(fif,fif)\\2  = sin  2Q1(Af,J\f)  < 

~ ^N+ 1 

As  ||H||2  approaches  zero,  the  Sin  0 and  Sin  2©  theorems  are  asymptotically 
equivalent,  since  sin01  approaches  |sin201  as  0X  approaches  zero.  When  the  per- 
turbation is  large,  however,  the  bound  given  by  the  Sin  2©  theorem  is  not  as  strong, 
since  a small  sin20j  can  also  occur  for  0j  near  n/2.  This  situation  occurs  because, 
without  knowledge  of  the  exact  eigenvalues,  it  is  impossible  to  be  certain  that  the 
eigenvectors  chosen  to  span  the  approximate  subspace  are  associated  with  the  correct 
eigenvalues.  How  big  a perturbation  is  required  before  there  is  a danger  of  mistaking 
an  eigenvalue  associated  with  the  noise  subspace  for  one  associated  with  the  signal 
subspace,  or  vice  versa?  An  answer  is  provided  by  the  following  theorem  [40,  p.  37]: 


82 


Theorem  4.4  Let  R and  R,  their  eigendecompositions,  and  their  invariant  sub- 
spaces be  as  defined  in  Theorem  f.l.  If  ||H||2  = ||R  — R||2  < PNr/2,  then  @x  < 7r/4. 

In  essence,  this  theorem  shows  that  if  the  error  in  the  matrix  estimate  is 
small,  it  is  impossible  for  an  eigenvector  belonging  to  one  subspace  to  be  mistaken  for 
another;  by  comparing  this  bound  with  Theorem  4.2,  we  can  see  that  the  condition 
||H||2  < PN  /2  guarantees  that  XN  > XN+1  and  XN  > XN+1,  so  that  the  eigenvalues 
that  determine  membership  in  the  two  subspaces  remain  in  the  correct  order;  this 
means  that  the  eigenvectors  that  define  the  subspaces  can  still  be  properly  identified 
from  the  perturbed  eigenvalues. 

Unfortunately,  bounds  involving  only  the  error  ||H||  are  relatively  blunt  in- 
struments; there  are  many  situations  where  the  error  in  the  subspace  estimate  is  far 
less  than  the  upper  limit  set  by  applying  the  Sin  0 and  Sin  20  theorems  to  ||H||. 
For  example,  A and  A + al  have  identical  invariant  subspaces,  but  the  norm  of  their 
difference  is  |a|,  which  can  be  arbitrarily  large.  The  bound  provided  by  the  Sin  © 
and  Sin  20  theorems  applies  in  one  direction  only:  although  the  invariant  subspaces 
of  two  matrices  must  be  close  if  the  norm  of  their  difference  is  small,  a large  difference 
does  not  imply  that  the  invariant  subspaces  are  far  apart.  This  means  that,  knowing 
||H||,  we  can  bound  the  error  in  the  subspace  estimates;  we  cannot,  however,  assert 
that  a better  estimate  of  R must  necessarily  lead  to  a better  estimate  of  the  subspace. 

Comparison  of  Frequency  Estimates 

In  this  section,  the  performance  of  the  three  autocorrelation  matrix  estimators 
is  compared  on  the  basis  of  their  accuracy  in  frequency  estimation.  Several  numerical 
experiments  were  conducted  using  the  root-MUSIC  and  TLS-ESPRIT  methods  for 
frequency  estimation;  although  the  results  shown  here  were  computed  using  TLS- 
ESPRIT,  the  results  for  MUSIC  are  very  similar,  and  lead  to  essentially  the  same 
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conclusions.  In  the  following  chapters,  fast  methods  will  be  described  for  performing 
frequency  estimation  using  Toeplitz  estimates  of  the  autocorrelation  matrix;  for  these 
examples,  however,  the  same  standard  0(M3)  computational  techniques  were  used 
for  both  Toeplitz  and  non-Toeplitz  matrices. 

From  our  analysis  of  autocorrelation  estimation  techniques,  we  would  expect 
the  performance  penalty  for  the  use  of  Toeplitz  autocorrelation  estimators  to  be  small 
when  there  are  many  samples  in  the  input  data  record,  since  the  Toeplitz  and  covari- 
ance estimators  approach  the  same  limit  for  large  data  records.  In  addition,  we  would 
also  expect  that  the  differences  between  the  performance  of  the  different  estimators 
would  decrease  as  the  signal-to-noise  ratio  decreases,  because  the  contribution  of  the 
Toeplitz  noise  autocorrelation  £ to  the  overall  estimate  becomes  larger.  We  will  see 
below  that  these  expectations  are  borne  out  in  practice. 

The  signals  and  noise  are  the  same  as  used  earlier  to  compare  the  accuracy 
of  the  autocorrelation  matrix  estimates  (cq  = 1,  u1  = 1.88496,  4>1  = 0.3,  a2  = 1, 
u>2  = 2.01062,  4> 2 = —0.4,  a2  = 0.01).  In  this  section  we  will  examine  the  bias 
and  variance  in  the  frequency  estimate  uq  when  the  various  autocorrelation  matrix 
estimates  are  used.  The  mean  squared  error  in  the  estimate  ui1  is  shown  in  Figure  4.5. 
(Although  the  Bhattacharyya  bound  limits  the  variance  of  an  estimate,  not  the  mean 
squared  error,  it  is  also  shown  to  provide  an  indication  of  the  quality  of  the  estimate.) 

The  mean  squared  error  in  the  frequency  estimate  for  each  of  the  three  ma- 
trix estimators  shows  important  differences  from  the  norm  of  the  error  in  R shown 
in  Figure  4.2.  For  small  autocorrelation  matrices,  the  error  in  all  three  frequency 
estimates  decreases  rapidly  with  the  size  of  the  autocorrelation  matrix,  even  though 
the  relative  error  of  each  matrix  estimate  ||R  — R||2  is  increasing  slowly.  Each  of  the 
estimates  shown  uses  the  entire  input  data  record  to  calculate  R,  so  the  Cramer-Rao 
and  Bhattacharyya  bounds  are  the  same  for  any  M.  However,  a qualitative  argument 
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Matrix  Dimension 


Figure  4.5:  Mean  Squared  Error  in  ux  for  the  Covariance  (solid  line),  Biased  (dotted 
line),  and  Unbiased  (dashed  line)  Estimators  for  ESPRIT  with  L — 200 
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based  on  the  statistical  bounds  provides  a justification  for  the  rapid  improvement  of 
the  quality  of  the  estimate  as  the  size  of  R increases. 

In  Chapter  2,  we  remarked  that  the  Cramer-Rao  bound  for  the  variance  of 
an  estimator  given  L consecutive  (coherent)  samples  of  a signal  decreased  asymptot- 
ically as  1/L3,  while  the  bound  for  an  estimator  given  the  same  number  of  samples 
in  K separate  data  records  of  length  L — L/K  (the  multiple  snapshot  case  in  ar- 
ray processing),  decreased  asymptotically  as  1 /KL3ep.  An  autocorrelation  matrix  of 
dimension  M contains  information  on  lags  from  — (M  — 1)  to  M - 1,  and  therefore 
implicitly  “windows”  the  input  data;  if  this  window  is  much  smaller  than  length  of 
the  data  record,  the  benefit  of  the  long  coherent  sampling  interval  is  lost,  and  the 
estimate  may  be  much  worse  than  the  optimum  given  by  the  statistical  bounds. 

From  Figure  4.5,  we  can  see  that  an  autocorrelation  matrix  estimate  whose 
dimensions  are  roughly  1/4  of  the  data  record  length  (M  > L/\)  is  required  before 
the  variance  of  the  estimate  approaches  the  statistical  bounds.  Because  the  computa- 
tional cost  of  a subspace  estimate  using  conventional  eigendecomposition  techniques 
is  0(M3),  this  implies  that  approaching  the  statistical  bounds  with  conventional 
methods  requires  a very  large  amount  of  computation  for  long  data  records. 

The  bias  in  each  of  the  frequency  estimates  is  shown  in  Figure  4.6.  Most  of 
the  error  using  the  Toeplitz  estimates  of  R is  due  to  bias;  the  bias  reaches  a plateau 
when  the  dimension  of  the  autocorrelation  matrix  is  approximately  1/8  of  the  length 
of  the  input  data  record,  and  remains  relatively  constant  thereafter.  For  all  but  very 
small  autocorrelation  matrices,  the  unbiased  Toeplitz  matrix  produces  a smaller  bias 
in  the  frequency  estimate  than  the  biased  Toeplitz  matrix.  The  notch  that  appears 
at  M = 27  is  a coherent  effect  whose  location  depends  on  the  frequency  and  phase  of 
the  signals;  although  it  appears  consistently,  its  exact  location  is  difficult  to  predict, 
so  it  would  be  difficult  to  select  M to  take  advantage  of  this  notch.  The  estimates 
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Figure  4.6:  Bias  of  u1  for  the  Covariance  (solid  line),  Biased  (dotted  line),  and 
Unbiased  (dashed  line)  Estimators  for  ESPRIT  with  L = 200 


87 


using  the  covariance  method  are  effectively  unbiased  once  a certain  threshold  M is 
reached;  this  threshold  is  approximately  1/10  of  the  length  of  the  data  record. 

The  variance  of  the  estimates  using  each  of  the  three  autocorrelation  matrix 
estimates  is  shown  in  Figure  4.7.  The  covariance  estimate  approaches  the  Bhat- 
tacharyya  bound  once  M > L/4,  and  remains  close  to  the  bound  until  finite  data 
record  effects  increase  the  error  for  M ~ L.  The  variance  for  the  Toeplitz  estimates 
is  only  slightly  greater  than  the  variance  using  the  covariance  estimates,  and  is  much 
less  affected  by  the  finite  data  record  when  M ~ L.  In  fact,  both  of  the  Toeplitz  esti- 
mates slightly  surpass  the  Bhattacharyya  bound  for  M « L;  this  is  possible  because 
the  bound  applies  to  unbiased  estimators,  and  both  the  Toeplitz-based  estimators  are 
biased. 

A final  aspect  of  the  performance  of  these  estimators  is  their  resolution.  The 
test  case  used  above  has  A / = 0.02,  while  L = 200,  so  that  the  frequency  separation 
is  a factor  of  four  larger  than  the  classical  resolution  limit  1 /L.  Since  the  major 
advantage  of  the  subspace  techniques  over  simpler  classical  approaches  is  their  high 
resolution,  it  is  important  to  determine  the  effects  of  using  Toeplitz  autocorrelation 
estimates  on  the  resolution.  Figure  4.8  shows  the  mean  squared  error  for  each  of  the 
three  estimators,  with  an  input  signal  whose  frequencies  are  separated  by  A / = 0.001, 
which  is  a factor  of  five  smaller  than  the  classical  resolution  limit.  The  other  signal 
parameters  are  identical  to  the  previous  cases  (ax  = 1,  uq  = 1.88496,  (f)1  = 0.3,  a2  = 1, 
u>2  — 1.89124,  02  = —0.4,  a2  — 0.01).  As  before,  the  Bhattacharyya  bound  is  shown 
for  reference. 

The  covariance  estimator,  as  expected,  resolves  the  signals  once  the  autocor- 
relation matrix  size  exceeds  L/4.  The  biased  estimator  also  resolves  the  signals,  but 
with  a large  bias,  as  can  be  seen  from  Figure  4.9,  which  shows  the  variance.  Note  that 
both  of  the  Toeplitz  estimates,  which  are  biased,  exceed  the  Bhattacharyya  bound, 
while  the  covariance  estimate  does  not  approach  the  bound. 
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Figure  4.7:  Variance  of  ihl  for  the  Covariance  (solid  line),  Biased  (dotted  line),  and 
Unbiased  (dashed  line)  Estimators  for  ESPRIT  with  L = 200 
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Matrix  Dimension 


Figure  4.8:  Mean  Squared  Error  of  Cj1  for  the  Covariance  (solid  line),  Biased  (dotted 
line),  and  Unbiased  (dashed  line)  Estimators  for  ESPRIT  with  A / = 0.001  and 

L = 200 
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Figure  4.9:  Variance  of  u)l  for  the  Covariance  (solid  line),  Biased  (dotted  line),  and 
Unbiased  (dashed  line)  Estimators  for  ESPRIT  with  A / = 0.001  and  L = 200 
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Although  the  biased  estimator  has  the  largest  mean  squared  error,  it  has  the 
lowest  variance  of  any  of  the  estimators.  In  fact,  the  biased  estimator  overestimates 
the  separation  of  the  two  sinusoids  by  approximately  a factor  of  two.  This  behavior 
is  very  sensitive  to  the  phase  and  frequency  of  the  two  signals;  in  other  similar  cases, 
the  biased  estimator  has  failed  to  resolve  the  two  signals.  The  unbiased  estimator, 
by  contrast,  consistently  exhibits  the  behavior  shown:  complete  failure  to  resolve  the 
signals  until  M ~ L/ 2,  followed  by  an  abrupt  transition  to  high  resolution. 

It  is  clear  from  these  results  that  the  subspace  estimators  retain  their  high 
resolution  when  Toeplitz  estimators  of  the  autocorrelation  matrix  are  used,  provided 
that  the  estimators  and  the  matrix  size  are  properly  chosen.  Although  the  biased 
estimator  can  exhibit  high  resolution,  as  in  the  previous  example,  it  also  occasionally 
fails  to  resolve  signals  which  are  resolved  with  the  covariance  estimate.  The  unbiased 
estimator  is  more  reliable,  but  when  it  is  used,  it  is  necessary  that  M > L/ 2,  to 
ensure  that  high  resolution  is  achieved.  When  these  guidelines  are  followed,  the  mean 
squared  error  of  the  Toeplitz-based  subspace  techniques  may  be  quite  close  to  the 
error  when  the  covariance  estimate  is  used. 

Based  on  the  results  shown  above,  we  may  conclude  that,  for  a given  data 
record  length  L and  autocorrelation  matrix  dimension  M,  the  covariance  estimate 
Rc  produces  the  most  accurate  frequency  estimates.  In  fact,  if  M > L/4,  then  esti- 
mates computed  using  Rc  are  practically  unbiased,  and  their  variance  approaches  the 
minimum  possible  variance  given  by  the  Bhattacharyya  bound.  However,  computing 
invariant  subspaces  when  the  covariance  estimate  is  used  requires  0(M3)  floating- 
point operations,  and  since  approaching  the  Bhattacharyya  bound  requires  that  M 
is  at  least  L/4,  this  performance  comes  at  a very  high  computational  cost,  especially 
for  large  data  records. 
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By  using  a Toeplitz  estimate  of  the  autocorrelation  matrix,  we  can  dramatically 
reduce  the  computational  cost  of  frequency  estimation.  In  cases  where  the  signal-to- 
noise  ratio  is  moderate  to  low,  or  when  there  are  many  samples  in  the  input  data, 
the  frequency  estimation  performance  with  a Toeplitz  estimate  can  be  very  close  to 
that  obtained  with  a covariance  estimate.  Conversely,  when  the  input  data  record 
is  short  and  the  signal-to-noise  ratio  high,  the  performance  penalty  will  be  greater. 
Although  the  accuracy  of  the  estimates  obtained  will  be  somewhat  less  than  if  a 
covariance  estimate  of  equal  dimension  were  used,  when  the  estimates  are  compared 
on  the  basis  of  equal  computational  cost,  the  estimate  computed  using  a Toeplitz 
autocorrelation  matrix  may  be  equal  or  even  superior  to  an  estimate  using  a smaller 
covariance  estimate  of  the  autocorrelation  matrix.  A detailed  comparison  of  accuracy 
and  speed  is  presented  in  Chapter  9,  where  we  will  see  that  this  is  often  the  case. 


CHAPTER  5 

FAST  ALGORITHMS  FOR  SOLVING  TOEPLITZ  EQUATIONS 


Although  solution  of  a general  system  of  n linear  equations  requires  0{n 3) 
floating-point  operations,  solution  of  a Toeplitz  system  may  be  performed  much  more 
efficiently.  There  are  several  fast  algorithms  for  solving  various  types  of  Toeplitz  sys- 
tems, including  the  conventional  “fast”  0(n2)  methods  such  as  the  Levinson-Durbin 
and  Trench  algorithms  [37],  and  a group  of  newer  “superfast”  methods,  which  are 
0(n\og2n)  [42-45]. 

Because  the  fast  eigendecomposition  algorithms  to  be  discussed  in  the  following 
chapter  rely  for  their  efficiency  on  a fast  method  for  solving  Toeplitz  equations,  this 
chapter  provides  a brief  review  of  the  fast  and  superfast  methods  for  solving  Toeplitz 
systems.  As  in  previous  chapters,  the  algorithms  are  summarized  using  MATLAB-style 
notation. 


Solving  the  Yule- Walker  Equation 

One  of  the  simplest  and  most  common  Toeplitz  systems  is  the  Yule- Walker 
equation,  a special  system  that  arises  in  linear  prediction  problems.  If  a n x n Toeplitz 
matrix  Tn  is  formed  from  the  autocorrelation  of  a stationary  random  process: 

*o  h • • • ^n— 1 

T _ ^1  *0  ^n-2 

~~  _ » 

tn- 1 tn_2  • • • to 
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where  ti  is  the  autocorrelation  at  lag  i,  then  the  optimum  least-squares  linear  predictor 
for  the  process  is  given  by  the  solution  of  the  system 
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tQ 

. an  . 

(5.1) 


or 


T„a  — 


t. 


This  system  of  equations  is  completely  determined  by  n -I-  1 coefficients,  t0...tn. 
Toeplitz  systems  of  the  form  given  by  equation  (5.1)  are  called  Yule- Walker  equations, 
and  are  sometimes  written  in  the  alternate  form 


tQ  tj 
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— 
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a 
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where  En  is  the  residual,  or  minimum  prediction  error.  If  Tn+1  is  positive  definite, 
the  prediction  errors  for  each  of  the  Yule- Walker  problems  of  increasing  size  defined 
by  the  leading  principal  submatrices  of  Tn+1  form  a decreasing  sequence  E0  > E1  > 
. . . > En  > 0;  the  zero-order  residual  E0  is  equal  to  t0. 
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The  Levinson-Durbin  Algorithm 

The  most  widely  used  method  for  solution  of  the  Yule- Walker  equations  is  the 
Levinson-Durbin  algorithm,  which  computes  the  solution  by  solving  a nested  series 
of  Yule- Walker  equations  for  T1;  T2, . . . , Tn,  using  the  fact  that  the  solution  of  a 
problem  of  size  n + 1 can  be  computed  from  the  solution  of  a problem  of  size  n.  If 

Tnan  = -*>  then 


^0 
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Sh 

Ph 

El 

an+l  — 

^n+1 

Ft 

T 

n 

t 

where  F is  the  “flip”  matrix,  with  ones  on  its  antidiagonal,  and  zeros  elsewhere;  mul- 
tiplying a vector  by  F reverses  the  order  of  its  coefficients.  The  solution  to  the  larger 
problem  may  be  easily  computed  by  using  this  relation,  as  shown  in  Algorithm  5.1 
below.  A complete  discussion  of  the  the  Levinson-Durbin  algorithm  can  be  found  in 
many  sources;  see,  for  example,  Golub  [37,  p.  183]. 


Algorithm  5.1:  Levinson-Durbin  Algorithm 

function  [a,  e]  = levinson(t,  n) 

(dimensions:  a(l  : n),e(0  : n),t( 0 : n)) 
a(l)  = -t(l)/i(0) 

7i  = a(l) 

p = m 

e(0)  = (3 

for  k = 1 : n — 1 

/?=(!-  7 l)*P 

e(fc)  - p 

7k+i  = ~(7k  + a(l  : k)T  * t(k  : -1  : l))//3 

a(l  : k ) = a(l  : k ) + 7^+1  * a(k  : — 1 : 1) 
a(k  + 1)  = 7/t+i 

end 

e(n)  = (1  -7n)  *e(n-  1) 

Algorithm  5.1  computes  the  solution  vector  and  the  prediction  errors  at  a cost 
of  2 n2  + 3n  floating-point  operations.  The  stability  of  the  Levinson-Durbin  algorithm 
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depends  on  the  magnitude  of  all  of  the  reflection  coefficients  being  less  than  one; 
this  is  equivalent  to  requiring  that  all  the  prediction  errors  E{  be  greater  than  zero, 
which  will  occur  if  and  only  if  T is  positive  definite. 

A useful  interpretation  of  the  process  of  solving  a Toeplitz  system  is  provided 
by  expressing  it  in  terms  of  operations  on  polynomial  matrices.  The  solution  vector 
a of  the  Yule- Walker  problem  defines  a polynomial 


Ak(z)  = i + 

i=l 

and  a reversed  polynomial 


At(z)=ztAk(z-1)=zt  + Za(i)zk-i. 

i=l 


In  terms  of  these  polynomials,  the  A;th  step  of  the  Levinson-Durbin  algorithm  is  given 
by 


Mz) 

1 

7 kz 

Ak-l(Z ) 

Mz ) 

7 k 

z 

Ak-l(Z)  _ 

The  initial  values  A0(z)  and  A0(z)  are  1 and  2 *,  respectively,  so  the  solution  to  the 
Yule- Walker  problem  is  given  by 


Mz ) 
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\r 

_ 7fe  2 

J 
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(5.2) 


This  way  of  expressing  Toeplitz  algorithms  will  be  useful  in  illustrating  the  similarities 
between  the  Levinson-Durbin  algorithm  and  the  other  approaches  described  in  later 


sections. 
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The  Yule- Walker  Prediction  Error 


The  Levinson-Durbin  algorithm  computes  the  vector  [E0,  Ev  . . . , En]  of  pre- 
diction errors  for  each  of  the  Yule- Walker  systems  of  size  0, 1, . . . , n that  are  solved 
in  the  course  of  computing  the  answer.  This  prediction  error  vector  is  important  in 
eigendecomposition  algorithms  because  it  defines  a diagonal  matrix  that  is  congruent 
toT„+1  [46,47]: 


Ar  T 

“^-n+l  Arj+1 


Ln+1 


diag  (En,En_v...,E0), 


where 


1 

0 

an(!) 

1 

«n(2) 

an-l(^) 

«n(3) 

a»— i(2) 

an-i(n  - !) 

0 0 

0 0 

0 0 

0 0 

“i(i>  i 


and  ak  is  the  solution  to  the  Yule- Walker  problem  of  size  k. 

Two  important  uses  of  the  prediction  error  sequence  are  based  on  this  congru- 
ence relation.  First,  the  product  of  the  prediction  errors  is  the  determinant  of  Tn+1, 
because  det(An+1)  = det(A^+1)  = 1: 


det(Tn+i)  = dettA-Jj  diag(£n, . . . , £0)An+i) 

det(diag(£n, . . . , E0 )) 

det(An+l)det(An+l) 

= r 


The  second  use  of  the  prediction  errors  is  based  on  the  Sylvester  inertia  theorem 
[37,  p.  416],  which  states  that,  if  X = PrYP,  where  X and  Y are  symmetric  and  P 
is  nonsingular,  then  X and  Y have  the  same  number  of  positive,  negative,  and  zero 
eigenvalues.  The  matrix  An+1  is  unit  lower  triangular,  so  it  is  obviously  nonsingular. 
Because  the  eigenvalues  of  a diagonal  matrix  are  just  the  diagonal  elements,  the 
number  of  positive,  negative,  and  zero  eigenvalues  of  Tn+1  may  be  determined  by 
simply  counting  the  number  of  positive,  negative,  and  zero  E{.  Both  these  properties 
will  be  extremely  useful  in  the  fast  eigendecomposition  algorithms  given  in  the  next 
chapter. 

The  Schur  Algorithm 

An  alternative  method  for  solving  the  Yule- Walker  equation  is  the  Schur  algo- 
rithm, which  takes  a somewhat  different  approach  to  the  calculation  of  the  solution. 
Algorithm  5.2  below  is  the  Schur  algorithm,  which  takes  as  input  a pair  of  vectors 
a = t(l:n)  and  /3  = t(0:n  — 1),  and  computes  vectors  u and  v such  that 


1 

0 

v 

= 

+ 

a 

u 

0 

Like  the  Levinson-Durbin  algorithm,  this  algorithm  also  computes  the  sequence  of 
prediction  errors  E{.  The  properties  and  derivation  of  the  Schur  algorithm  are  dis- 
cussed in  much  greater  detail  by  Kailath  [48,49]  and  Therrien  [50,  p.  444], 
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Algorithm  5.2:  Schur  Algorithm 


function  [u,v,e]  = schur(ct,  ft,  n) 

(dimensions:  u(l  : n),w(l  : n),e(0  : n),a(l  : n),/3(  1 : n)) 
u(l  : n)  = 0 
«(1)  = 1 
v(2  : n)  = 0 
7o  = 0 
for  k = 1 : n 


a(k  — 1)  = —a(k) 

kk  - 1)  = m 

a(k-  2: -1:0) 
P(k-  1 : -1  : 1) 
7 k = «(0)//?(0) 
u(l  : k) 
v(l  : k ) 
e(A:  — 1)  = 

0(0)  = (1 

end 

e(n)  = m 


1 

— 7fc-i 

a(k  — 1 

i 

i-H 

1 

1 

1 

-7fe-l 

1 

0(k  — 1 

-1:1)  J 

= 

u(l  : k) 
u(l  : k ) 

+ 7*  * 

m 

-7 1) 

v(k  : —1 
u(k  : —1 


1) 

1) 


Using  the  polynomials  defined  by 


Un(z)  = 

i=l 

K(Z ) = XMO*'. 

Ujz)  = z"Un(z~l), 


znVn(z~l), 


the  result  of  the  Schur  algorithm  may  be  expressed  as 


U„(z)  vn(z) 

n 

= n 

1 7** 

RW  un(z) 

i 

_ 7 k z 

(5,4) 


By  combining  equation  (5.4)  with  the  initial  conditions  given  in  equation  (5.2),  it  is 
easily  determined  that  the  solution  of  the  Yule- Walker  equation  is  A(z)  — U{z)  + 
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z~1V(z),  which  is  equivalent  to  equation  (5.3).  Solution  of  the  Yule- Walker  problem 
using  the  Schur  algorithm  is  shown  in  Algorithm  5.3,  which  requires  4 n2  + n floating- 
point operations. 

Algorithm  5.3:  Schur  Yule- Walker  Solution 

function  [a,  e]  = schuryw(£,n) 

(dimensions:  a(l  : n),e(0  : n),f(0  : n)) 

[u,  v,  e]  = schur(f(l  : n),  t( 0 : n — 1),  n) 
fori  = 1 : n — 1 

a(i)  = u(i)  + v(i  -(-  1) 
end 

a(n)  = u(n) 

Although  the  Schur  algorithm  is  less  efficient  on  a single  processor  than  the 
Levinson-Durbin  algorithm,  it  is  better  suited  to  parallel  implementations,  since  no 
inner  products  are  required  in  its  computation.  Much  of  the  current  interest  springs 
from  the  fact  that,  using  the  Schur  algorithm,  solution  of  the  Yule- Walker  equations 
on  a parallel  machine  with  at  least  n processors  is  possible  with  an  execution  time  of 
0(7.)  [51], 

The  “Superfast”  Generalized  Schur  Algorithm 

All  of  the  algorithms  discussed  to  this  point  are  conventional  fast  algorithms, 
which  are  0(n2).  An  efficient  0{n  log2  n)  algorithm,  based  on  the  Schur  algorithm 
above,  was  proposed  by  Ammar  and  Gragg  [52,53],  and  independently  by  de  Hoog 
[44],  and  Musicus  [54].  Although  several  0(nlog2n)  methods  for  solving  the  Yule- 
Walker  equation  had  been  previously  advanced  [42,43,55],  most  of  these  methods 
were  very  inefficient,  becoming  faster  than  the  conventional  0{n2)  algorithms  only 
for  n greater  than  several  thousand.  In  contrast,  the  generalized  Schur  algorithm  1 as 

1The  term  “generalized  Schur”  has  also  been  used  by  Kailath  and  others  to  refer  to  the  Schur 

algorithm  described  in  the  previous  section.  In  this  work,  the  term  generalized  Schur  algorithm  will 
be  used  only  to  refer  to  the  0(n  log2  n)  algorithm  described  here;  this  usage  is  consistent  with  that 
of  Ammar  and  Gragg. 


101 


implemented  by  Ammar  and  Gragg  [45,56],  requires  Snlog^n-f  O(nlogn)  floating- 
point operations,  making  it  less  complex  than  the  Levinson-Durbin  algorithm  for 
n > 256. 

The  key  to  the  efficiency  of  the  generalized  Schur  algorithm  is  a fast  method 
for  computing  the  Schur  polynomials  given  by  equation  (5.4).  By  factoring  the  matrix 
produced  by  the  Schur  algorithm, 


UM  v„(z) 
v„(z)  U„(z ) 
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the  problem  has  been  split  in  two;  the  first  part  is  another  Schur  problem  of  half  the 
size,  and  the  second  is  the  computation  of  U' , V ',  U',  and  V' . 

It  was  shown  by  de  Hoog  that  the  computation  of  U' , V' , U' , and  V'  could  also 
be  transformed  into  a Schur  problem  of  order  n/2,  whose  input  could  be  computed 
from  the  original  input  and  the  output  of  the  first  Schur  problem,  and  that  these 
computations  required  only  polynomial  multiplication  and  addition  [44,53].  The 
Schur  problem  of  order  n has  thus  been  split  into  two  Schur  problems  of  order  n/2. 

If  n can  be  factored  as  n = 2kN',  the  splitting  can  be  continued  recursively, 
until  the  problem  reaches  size  N',  and  the  conventional  Schur  algorithm  may  be  used 
from  that  point.  Also,  at  a certain  problem  size  NT,  the  generalized  Schur  algorithm 
becomes  less  efficient  than  the  conventional  algorithm,  and  it  is  more  effective  to  solve 
the  smaller  problems  using  the  conventional  algorithm;  typically  NT  rj  64.  In  the 
de  Hoog,  Musicus,  and  Ammar  and  Gragg  implementations,  n was  restricted  to  be  a 
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power  of  two,  but  this  is  not  necessary;  this  point  is  discussed  in  more  detail  below. 
The  resulting  procedure  is  shown  in  Algorithm  5.4,  where  the  symbol  indicates 
polynomial  multiplication,  or  convolution. 


Algorithm  5.4:  Generalized  Schur  Algorithm 
function  [it,  v,  e ] = gsa(a,  /3,  n) 

(dimensions:  it(l  : n),n(l  : n),e(0  : n),a(l  : n),/3(  1 : n)) 
if  n < Nt 

[u,  v,  e]  = schur(a,  (3,  n) 

else 

m = n/2 

[ito> ^0)  e(0  : m — 1)]  = gsa(a(l  : m), /3(  1 : m),  m) 

«o(l  : m -t-  1)  = [0,  ito(m  : —1  : 1)] 
n0(l  : m + 1)  = [0,  vo(m  : -1  : 1)] 
z(  1 : n)  = a * no  — /3  * ito 
ai(l  : m)  = z(m  + 1 : n) 

z(  1 : n)  = (3*  vo  — a*  uo 

Pi(l  : m)  — z(m  + 1 : n) 

[tti,vi,e(m  : n - 1)]  = gsa(ai,  /?i,  m) 
it  = iti  * vq  + [ito  * V\ , 0] 
v = iti  ★ ito  + [uo  * v\ , 0] 

end 

e(n)  = (1  — it(n)2)e(n  — 1) 

Because  polynomial  multiplication  can  be  performed  using  the  FFT,  the  in- 
termediate computations  at  each  stage  are  C?(n  log  n),  and  the  overall  algorithm  is 
(9(nlog2n).  Ammar  and  Gragg  [45]  have  extensively  optimized  the  algorithm,  using 
split-radix  FFT  algorithms  [57, 58]  to  perform  the  convolutions,  and  rearranging  the 
computations  to  take  full  advantage  of  the  results  of  earlier  recursive  steps.  A detailed 
description  of  the  algorithm  and  an  annotated  Fortran  program  that  implements  it 
are  given  in  Appendix  B. 

Although  the  algorithm  as  described  by  de  Hoog,  Ammar,  and  Gragg  is  re- 
stricted to  solving  systems  of  size  2k,  it  is  important  to  note  that  the  range  of  sizes 
can  be  greatly  expanded  by  a simple  modification.  The  algorithm  recursively  splits 
the  problem  until  it  has  been  divided  into  subproblems  of  size  NT  or  smaller;  as  long 
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as  the  size  of  the  problem  n can  be  factored  as  n = 2kN',  where  N'  < NT,  then  the 
generalized  Schur  algorithm  can  be  applied.  To  achieve  the  highest  possible  efficiency, 
it  is  desirable  to  restrict  N'  to  be  a product  of  small  primes,  so  that  a prime  factor 
FFT  algorithm  [58,59]  may  be  used  to  compute  the  convolutions.  This  modification 
allows  the  algorithm  to  be  used  for  a much  wider  range  of  sizes,  at  the  cost  of  a slight 
reduction  in  performance  when  n is  a power  of  2.  Because  it  was  designed  to  achieve 
the  highest  possible  efficiency,  the  algorithm  given  in  Appendix  B uses  a real-valued 
split-radix  FFT,  and  so  is  restricted  to  n = 2k,  but  a more  flexible  algorithm  may 
be  obtained  by  simply  replacing  the  split-radix  FFT  with  a real-valued  prime  factor 
FFT. 

Performance  Comparison 

When  the  complexity  of  algorithms  for  Yule- Walker  solution  is  measured  in 
terms  of  floating-point  operations,  the  Levinson-Durbin  and  generalized  Schur  algo- 
rithms have  equal  complexity  for  matrices  of  size  n = 256.  In  practice,  the  better 
locality  of  reference  of  the  Levinson-Durbin  algorithm,  compared  to  the  FFTs  used  in 
the  generalized  Schur  algorithm,  gives  the  Levinson-Durbin  algorithm  a performance 
advantage  that  is  not  overcome  until  n « 512.  The  execution  time  for  a single  solution 
of  the  Yule-Walker  equations  on  a general-purpose  processor  is  shown  in  Figure  5.1. 

The  poor  locality  of  reference  of  the  FFT  handicaps  the  generalized  Schur  algo- 
rithm somewhat  on  general-purpose  processors.  Microprocessors  designed  for  digital 
signal  processing,  however,  are  optimized  for  maximum  performance  on  the  FFT.  The 
results  of  profiling  the  generalized  Schur  algorithm  indicate  that  it  would  be  ideally 
suited  for  implementation  on  these  processors,  where  its  efficiency  advantage  over  the 
Levinson-Durbin  algorithm  should  be  even  greater  than  indicated  by  Figure  5.1. 


Execution  Time  (s) 
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Figure  5.1:  Execution  Time  versus  Matrix  Dimension  for  the  Levinson-Durbin 
algorithm  (solid  line)  and  the  generalized  Schur  algorithm  (circles) 
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General  Toeplitz  Systems 

All  of  the  algorithms  described  to  this  point  solve  a very  restricted  type  of 
Toeplitz  system,  the  Yule- Walker  equation.  Although  fast  0(n 2)  methods,  such  as 
Trench’s  algorithm  [37],  are  available  for  solving  general  Toeplitz  systems,  in  many 
cases  the  systems  encountered  in  signal  processing  can  be  solved  more  efficiently  by 
making  use  of  two  important  properties  of  Toeplitz  matrices. 

Toeplitz  Matrix- Vector  Products 

Because  the  multiplication  of  a vector  by  a Toeplitz  matrix  is  equivalent  to 
convolution,  the  product  of  a vector  with  a Toeplitz  matrix  may  be  efficiently  com- 
puted using  the  fast  Fourier  transform  [59,60].  If  the  elements  of  the  first  row  and 
column  of  a Toeplitz  matrix  Tn  are  contained  in  the  vectors  trow  and  tcol,  the  vector 
y = Tnx  is  computed  by  the  following  algorithm. 


Algorithm  5.5:  Toeplitz  Matrix-Vector  Multiplication 
function  y = tmvmul(trou,,  tco[,  x,  n) 

(dimensions:  t/(l  : n),tTOW( 0 : n — l),tCoi(0  : n ~ 1),  a?(l  : n)) 
choose  N > 2n  + 1,  a convenient  length  for  the  FFT 
£(1  : N)  = [0, . . . , 0,  trow(n  - 1 : -1  : 1),  tco/(0  : n - 1)] 
x{\  : N)  = [x(l  : n),0, . . . ,0] 
r = fft(t,  N ) 

X = fft(x,  N) 
fori  = 1 : N 

C (0  = r(*')*x(*) 

end 

y = ifft(C,A0 
y = y(N  — n + 1 : N) 

If  Tn  and  x are  real,  the  split-radix  algorithm  may  be  used  to  compute  the 
forward  and  inverse  FFTs,  at  a total  cost  of  about  6AHog2  N — 11 N floating-point 
operations  [57-59].  Matrix-matrix  multiplication  where  one  of  the  matrices  is  Toeplitz 
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may  be  performed  even  more  efficiently,  since  the  transform  r computed  from  Tn  need 
only  be  computed  once. 

The  Gohberg-Semencul  Relation 

Using  the  solution  of  the  Yule- Walker  equation  and  the  fast  Toeplitz  multipli- 
cation algorithm  given  above,  it  is  often  possible  to  solve  a general  Toeplitz  system 
with  only  C>(nlogn)  additional  floating-point  operations  [60],  using  a property  of 
nonsingular  Toeplitz  matrices  derived  by  Gohberg  and  Semencul  [61]. 


Theorem  5.1  (Gohberg-Semencul)  Let  Tn+1  and  its  largest  leading  principal 
submatrix  Tn  be  nonsingular  symmetric  Toeplitz  matrices,  and  a be  the  solution  to 
the  Yule-  Walker  problem 


' n+l 


1 

En 

— 

n 

a 

0 

with  £n/0.  Then  Tn+1  is  nonsingular,  and  its  inverse  is  given  by 


T;i,  = F(LLr  - MMr), 


where 


L = 


1 0 

0(1)  1 

a( 2)  a(l) 


0 

0 

1 


0 

0 

0 


a(n)  a(n  — 1)  a(n  — 2)  ...  1 
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0 0 ...  0 0 

a{n)  0 0 0 


M = 


a(n  — 1)  a(n) 


0 0 


a(l)  a( 2)  . . . a(n)  0 


Because  L and  M are  Toeplitz,  their  matrix-vector  products  may  be  computed 
efficiently  using  the  FFT.  This  allows  the  general  Toeplitz  problem  Tn+1x  = y,  where 
Tn+1  satisfies  the  requirements  of  the  Gohberg-Semencul  relation,  to  be  solved  by 
computing 

T;j,y  = T( LLt  - MMT)y, 

using  Algorithm  5.5  to  compute  the  matrix-vector  products.  Thus,  a general  Toeplitz 
system  may  be  solved  in  O(nlogn)  operations  once  the  solution  to  the  Yule- Walker 
problem  is  known.  Since  the  superfast  algorithms  can  compute  a Yule- Walker  solution 
in  0{n  log2  n)  operations,  a general  Toeplitz  system  can  be  solved  without  increasing 
the  asymptotic  complexity  of  the  superfast  algorithms. 

The  form  of  the  Gohberg-Semencul  relation  given  above  has  been  specialized 
to  the  case  of  symmetric  nonsingular  Toeplitz  matrices.  Although  this  is  sufficient  for 
our  needs,  versions  exist  that  are  more  general.  The  Gohberg-Semencul  relation  has  a 
general  form  for  nonsymmetric  Toeplitz  matrices  [61],  an  extension  of  the  relation  to 
the  case  where  Tn+1  may  be  singular  has  been  given  by  Zhong  [62],  and  the  possibility 
of  a more  efficient  solution  for  the  singular  case  has  been  suggested  by  Heinig  and 
Hellinger  [63].  The  approach  described  in  these  works  produces  if  Tn+1  is 

nonsingular,  and  the  Moore-Penrose  inverse  T++x  in  the  singular  case.  However,  the 
computation  required  when  Tn+X  is  singular  is  several  times  that  required  in  the 
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nonsingular  case,  which  makes  it  impractical  for  use  in  the  algorithms  described  in 
Chapter  6. 


Numerical  Properties  of  Toeplitz  Algorithms 

In  this  section,  we  will  discuss  the  numerical  properties  of  the  fast  Toeplitz 
algorithms,  that  is,  the  effect  of  finite  precision  calculations  on  the  accuracy  of  the 
results.  This  section  first  briefly  reviews  the  concepts  of  stability  and  condition  as 
they  apply  to  the  solution  of  Toeplitz  systems,  and  then  examines  the  stability  of  the 
three  algorithms  discussed  above. 

Stability  and  Condition 

In  the  context  of  solving  linear  equations,  a stable  algorithm  is  one  that,  when 
solving  the  equation  Ax  = b,  produces  a computed  solution  x that  is  the  exact 
solution  of  a nearby  problem  Ax  = b.  By  “nearby,”  we  mean  that  A is  close  to 
A,  and  b is  close  to  b;  typically,  a matrix  norm  is  used  to  measure  closeness.  It  is 
important  to  note  that  stability  does  not  ensure  that  x is  close  to  x;  this  requires  not 
only  that  the  algorithm  be  stable,  but  also  that  the  original  problem  Ax  = b be  well 
conditioned. 

A system  is  well  conditioned  when  small  changes  in  A and  b result  in  small 
changes  in  the  exact  solution  x;  conversely,  a system  is  ill  conditioned  when  small 
changes  in  A and  b can  cause  large  changes  in  the  exact  solution.  Because  rounding 
errors  are  almost  unavoidable,  an  ill-conditioned  system  is  difficult  to  solve  accurately 
with  any  algorithm,  even  a very  stable  one;  once  a small  amount  of  error  has  crept 
into  the  computation,  the  system  has  effectively  been  perturbed,  and  if  the  system  is 
ill  conditioned,  the  computed  answer  may  differ  dramatically  from  the  exact  solution. 

A quantitative  measure  of  how  much  change  can  occur  in  the  solution  due  to 
a change  in  the  inputs  is  provided  by  the  condition  number  k.  If  Ax  = b is  the 


109 


exact  system,  and  the  computed  solution  x + Ax  is  the  exact  solution  to  a perturbed 
problem  (A  + AA)(x  + Ax)  = b + Ab,  then  the  error  in  the  solution  Ax  is 


l|Ax|l 

11*11 


< K 


+ o 


I1AAH2  HAb|n 
II A||2  ’ ||b||*  ) ’ 


so  the  output  error  is  no  more  than  k times  the  size  of  the  perturbations,  as  long 
as  the  perturbations  are  small  enough  that  the  higher-order  terms  may  be  ignored. 
The  condition  number  k measures  the  amount  by  which  errors  in  the  input  may 
be  magnified  in  the  solution;  if  e is  the  relative  size  of  the  error  introduced  by  a 
floating-point  computation2,  then  even  a perfectly  stable  algorithm  can  be  expected  to 
compute  a solution  with  a relative  error  of  ke.  A thorough  discussion  of  the  concepts 
of  stability  and  condition  has  been  given  by  Bunch  [64],  among  many  others. 


Stability  of  Fast  Toeplitz  Algorithms 

The  stability  of  the  conventional  0(n 2)  methods  for  solving  Toeplitz  systems 
of  equations  has  been  considered  by  Cybenko  [65]  and  Bunch  [66].  Because  the  con- 
ventional Levinson-Durbin  and  Schur  algorithms  solve  a series  of  successively  larger 
Yule- Walker  problems,  both  of  these  algorithms  will  be  unstable  whenever  any  of  the 
leading  principal  submatrices  of  Tn  is  ill  conditioned  [65].  It  is  possible  for  a lead- 
ing principal  submatrix  to  be  ill  conditioned  even  though  Tn  is  well  conditioned;  for 

2For  most  modern  computers,  a single  elementary  floating-point  operation  produces  results  that 
are  as  close  as  possible  to  the  exact  answer,  given  the  limitations  of  floating-point  representation.  In 
this  case,  e is  the  smallest  number  such  that  1 + e ^ 1,  when  computed  in  floating  point;  for  64-bit 
double  precision  conforming  to  the  IEEE  754  standard,  e « 2.2  x 10-16. 
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example: 

1 3 1 — e — 1 

3 1 3 1 — e 

1 — e 3 1 3 

-1  1 — e 3 1 

is  well  conditioned,  with  n ~ 2.6,  but  the  leading  3x3  principal  submatrix  has 
condition  number  k ~ 5.8/e  [67].  This  indefinite  matrix  is  solved  accurately  by 
general  (9(n3)  algorithms,  but  not  by  the  fast  Toeplitz  algorithms. 

For  one  class  of  matrices,  it  is  possible  to  be  certain  that  all  of  the  leading 
principal  submatrices  are  at  least  as  well  conditioned  as  Tn  itself.  If  Tn  is  positive 
definite,  then  the  interlacing  property  of  the  eigenvalues  of  leading  principal  subma- 
trices guarantees  that  no  leading  principal  submatrix  will  be  more  ill  conditioned  than 

T„- 

Stability  of  Superfast  Toeplitz  Algorithms 

We  have  seen  that  the  fast  Toeplitz  algorithms  are  unstable  whenever  one 
of  the  smaller  problems  solved  in  the  course  of  the  computation  is  ill  conditioned. 
The  superfast  algorithms  have  similar  characteristics,  but  the  manner  in  which  the 
problem  is  divided  is  different,  so  the  subproblems  whose  condition  determines  the 
stability  of  the  algorithm  are  also  different.  Each  stage  of  the  generalized  Schur 
algorithm  solves  two  Schur  problems  of  half  the  size  of  the  input  problem.  The 
subdivision  of  the  problem  continues  until  the  original  problem  of  size  n has  been 
divided  into  n/NT  smaller  problems,  which  are  then  solved  by  the  standard  Schur 
algorithm.  The  generalized  Schur  algorithm  will  therefore  be  unstable  if  one  or  more 
of  these  subproblems  is  ill  conditioned. 

One  of  these  subproblems  is  the  Schur  problem  of  size  Nr  determined  by  the 
leading  principal  submatrix  of  dimension  NT ; the  other  problems  are  functions  of 


1 
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both  the  input  vectors  and  intermediate  results,  and  are  not  easily  related  to  the 
entries  of  Tn.  A brief  analysis  of  the  stability  of  superfast  algorithms  has  been  given 
by  Bunch  [66],  which  indicates  that  the  generalized  Schur  algorithm  may  be  unstable 
if  Tn+1  is  not  positive  definite. 

It  is  interesting  to  note  that  the  superfast  algorithms  remain  0(nlog2n)  even 
if  a stable  0(N £)  method  is  used  to  solve  the  final  series  of  subproblems.  This 
does  not  render  the  superfast  algorithms  stable,  because  it  is  still  possible  for  a 
subproblem  of  size  NT  to  be  ill  conditioned.  It  does,  however,  remove  the  possibility 
of  ill  conditioning  for  the  submatrices  of  the  NT  x NT  systems  solved  at  the  lowest 
level,  and  make  it  possible  for  the  condition  of  the  subproblem  to  be  bounded  exactly 
(see  equation  (6.5)).  A more  practical  strategy  would  be  to  employ  a fast  0(N £) 
method  with  the  capability  to  “step  over”  nearly  singular  submatrices,  such  as  those 
given  by  Chan  and  Hansen  [68]  and  Zarowski  [69].  Although  extensive  numerical 
tests  have  shown  that  loss  of  accuracy  due  to  ill  conditioning  of  subproblems  is  rare, 
this  approach  is  available  for  those  situations  where  it  proves  to  be  a problem. 

Although  determining  beforehand  whether  the  superfast  algorithms  are  stable 
for  a given  input  is  difficult,  it  is  simple  to  determine  during  a computation  whether 
the  problem  being  solved  is  ill  conditioned.  A poorly  conditioned  problem  at  any  step 
of  the  Schur  algorithm  will  be  indicated  by  a small  prediction  error  E{  or  a large  result 
a for  that  step;  since  the  Schur  algorithm  is  the  only  potentially  unstable  element  of 
the  generalized  Schur  algorithm,  this  provides  a reliable  test  for  ill  conditioning.  For 
the  problems  to  be  addressed  in  the  following  chapter,  condition  number  estimation 
is  not  necessary,  but  in  cases  where  a condition  number  estimate  is  needed,  it  may 
be  computed  from  the  prediction  error  sequence  [70].  This  problem  is  considered  in 
greater  detail  in  the  next  chapter. 


CHAPTER  6 

FAST  ALGORITHMS  FOR  TOEPLITZ  EIGENDECOMPOSITION 


This  chapter  describes  the  application  of  the  fast  Toeplitz  solution  methods 
developed  in  the  previous  chapter  to  the  calculation  of  eigenvalues  and  eigenvectors. 
The  subspace  techniques  all  require  knowledge  of  the  eigenvectors  and  eigenvalues  of 
the  estimated  correlation  matrix  R;  when  a non-Toeplitz  autocorrelation  estimate  is 
used,  this  requires  0(M3)  operations  (where  M is  the  size  of  R),  a heavy  computa- 
tional burden  for  large  M.  This  has  been  a barrier  to  the  use  of  subspace  techniques  in 
situations  where  large  autocorrelation  matrices  are  needed  to  achieve  high  accuracy. 

Unfortunately,  as  we  have  seen  in  Chapter  4,  when  the  data  records  are  long,  a 
large  autocorrelation  matrix  is  required  to  approach  the  statistical  limits  to  frequency 
estimation  accuracy.  The  effects  of  using  Toeplitz  autocorrelation  estimates  were  also 
examined,  and  it  was  shown  that,  although  the  performance  of  estimators  using  a 
Toeplitz  matrix  was  usually  somewhat  poorer  than  the  standard  (non-Toeplitz)  im- 
plementations for  constant  matrix  dimension  M,  acceptable  results  could  still  be 
obtained,  particularly  when  the  size  of  the  input  data  record  was  large.  The  fast 
Toeplitz  solution  methods  described  in  the  preceding  chapter  make  possible  fast  algo- 
rithms for  the  eigenanalysis  of  Toeplitz  matrices.  These  allow  the  computation  of  any 
N eigenvalues  and  eigenvectors  of  a M x M Toeplitz  autocorrelation  estimate  R in 
0(NM 2)  or  0{NM  log2  M)  operations,  depending  on  the  choice  of  Toeplitz  solution 
method. 

When  these  fast  Toeplitz  eigendecomposition  techniques  are  employed  in  sub- 
space algorithms  for  frequency  estimation,  much  larger  autocorrelation  matrices  can 
be  used.  In  many  cases,  the  improvement  in  frequency  estimation  performance  due 
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to  the  larger  matrix  more  than  compensates  for  the  degradation  caused  by  the  use  of 
a Toeplitz  estimate  of  the  autocorrelation  matrix,  so  these  fast  implementations  offer 
important  computational  advantages. 

The  Standard  Eigenproblem 

We  will  begin  with  a discussion  of  fast  algorithms  for  the  standard  Toeplitz 
eigenproblem  TMe  — Ae,  which  is  the  basis  for  subspace  algorithms  for  frequency 
estimation  in  white  noise.  The  extension  of  this  approach  to  the  generalized  Toeplitz 
eigenproblem  TMe  = A£Me  is  straightforward,  and  will  be  described  in  the  last 
section. 

Fast  Computation  of  the  Minimum  Eigenvalue 

Using  conventional  fast  algorithms  for  solving  Toeplitz  systems,  Cybenko  and 
Van  Loan  [71]  developed  an  0{M 2)  approach  for  finding  the  minimum  eigenvalue  of  a 
M x M symmetric  Toeplitz  matrix  TM.  Similar  approaches  have  also  been  proposed 
by  Hayes  and  Clements  [72],  and  by  Hu  and  Kung  [73].  These  methods  for  fast 
Toeplitz  eigendecomposition  are  based  on  the  same  strategy:  the  direct  search  for  the 
zeros  of  the  characteristic  polynomial  det(AI  — TM),  which  are  the  eigenvalues  of  T M. 
The  determinant  is  computed  without  forming  the  characteristic  polynomial;  explicit 
calculation  of  the  characteristic  polynomial  leads  to  methods  with  poor  numerical 
properties. 

For  general  matrices,  computation  of  the  determinant  requires  0(M 3)  opera- 
tions, and  since  several  evaluations  of  the  determinant  are  usually  needed  to  locate  a 
zero,  this  method  is  almost  always  less  efficient  than  other  0(M 3)  techniques,  such  as 
Householder  reduction  and  QR  iteration.  However,  for  structured  matrices  whose  de- 
terminants are  easier  to  compute,  a direct  search  for  zeros  of  the  determinant  is  often 
an  effective  technique,  especially  when  only  a subset  of  the  eigenvalues  are  needed. 
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In  fact,  Cybenko  and  Van  Loan’s  algorithm  for  Toeplitz  eigendecomposition,  which 
is  described  below,  is  strikingly  similar  to  the  bisection  method  [74],  which  is  widely 
used  for  computing  a subset  of  the  eigenvalues  of  a symmetric  tridiagonal  matrix. 

All  of  the  fast  Toeplitz  eigendecomposition  methods  are  based  on  the  same 
properties  of  the  prediction  error  for  the  Yule- Walker  problem;  we  will  begin  with  a 
description  of  the  Cybenko- Van  Loan  algorithm,  which  is  the  basis  for  all  of  the  algo- 
rithms considered  here.  By  definition,  the  minimum  eigenvalue  XM,  and  its  associated 
eigenvector  eM  satisfy  the  equation 

^MeM  = ^ MeM ■ 

We  will  assume  that  \M  is  less  than  the  minimum  eigenvalue  crM_x  of  TM_X,  the 
largest  leading  principal  submatrix  of  TM.  Now  partition  TM  and  eM  into 
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The  restriction  that  XM  < crM_x  implies  that  x ^ 0,  because  if  x = 0,  TM-1y  = 
AMy,  which,  together  with  the  interlacing  property  of  eigenvalues,  requires  XM  = 
°M_v  (For  autocorrelation  estimates  computed  from  data  with  random  noise,  the 
eigenvalues  are  strictly  separated  with  probability  one,  so  the  requirement  that  XM  < 
aM_x  is  satisfied  with  probability  one.)  Separating  the  equations  gives 

Tw-iy  + V = Awy.  (6-1) 


txy  + t0x  = XMx. 


(6.2) 
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From  equation  (6.1): 


y — (^M-i  AMI)  ^\x 

^ (^m-i  — ^MI)  1 exists.  Let  a be  the  solution  to 

(TM-1  — ^MI)a  = — f 1 

which  exists  whenever  a solution  to  equation  (6.3)  exists.  Then 

a : '(Tjh-i  — AMI)  1*i- 

Substituting  equation  (6.4)  and  equation  (6.3)  into  equation  (6.2), 

t^ax  T t0x  = XMx. 

Since  x ^ 0, 

tfa  + t0  — \M. 

We  can  see  that  the  any  eigenvalue  of  TM  is  a root  of  the  equation 

E{  A)  = £q  — At  t^a, 

where  E( A)  is  the  final  prediction  error  for  the  Yule- Walker  system 


(6.3) 
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a 

0 

(6.4) 


(T M - AI) 
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From  the  properties  of  the  prediction  error  discussed  in  Chapter  5, 


E(\) 


<fet(TM  - AI) 
det(TM_j-AI)’ 


so  the  roots  of  .E'(A)  are  the  eigenvalues  of  TM,  and  the  poles  of  E( A)  are  the  eigen- 
values of  Tm_1,  as  long  as  the  eigenvalues  of  TM  and  TM-1  are  strictly  separated. 
The  computation  of  the  ratio  of  the  determinant  of  TM  to  that  of  its  largest  leading 
principal  submatrix,  rather  than  the  determinant  of  TM  itself,  is  also  advantageous 
from  a computational  standpoint  when  TM  has  a group  of  closely  spaced  eigenval- 
ues [74].  In  this  case  det(TM  — AI)  = n£ii  (A k — A)  can  be  extremely  small  for  values 
of  A between  the  eigenvalues  in  the  group,  causing  difficulty  is  determining  the  loca- 
tions of  the  zeros  accurately.  On  the  other  hand,  if  the  eigenvalues  of  TM-1  strictly 
separate  those  of  TM,  E( A)  has  a pole  between  each  pair  of  roots,  and  numerical  root 
location  is  greatly  simplified.  It  might  appear  that  the  possibility  of  E( A)  becoming 
infinite  is  a drawback  to  this  approach,  however,  in  floating-point  computations  it 
is  not  unusual  for  the  values  of  polynomials  of  high  degree  to  exceed  the  maximum 
value  allowed  by  the  floating-point  system  being  used,  so  the  determinant  is  often 
effectively  infinite  in  any  case. 

In  finding  the  roots  of  E( A),  the  derivative  with  respect  to  A is  also  useful: 


E\ A)  - -1  -aTa. 


The  fact  that  E'( A)  is  always  negative  further  simplifies  the  search  for  roots  of  E( A). 
As  we  shall  see  below,  the  use  of  derivative  information  and  the  proper  choice  of  a 
numerical  zero-locating  technique  can  minimize  the  computation  required  to  find  the 
eigenvalues  of  TM. 
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Cybenko  and  Van  Loan  proposed  the  use  of  bisection  and  Newton’s  method  to 
determine  the  zero  of  E( A)  between  0 and  a M_x,  which  is  XM,  the  smallest  eigenvalue 
of  Tm.  Their  algorithm  can  be  summarized  as  follows: 


Algorithm  6.1:  Cybenko- Van  Loan  Algorithm 

find  A(°)  such  that  \n  < A^0)  < <rn_i 
while  \E{\(k~V)\  > e 

solve  (T^r-i  — A(fc_1^I)a  = — t 

X(k)  = X(k- 1)  _ J5(A(fc-1))/jB'(A(fe-1)) 

end 


Although  Cybenko  and  Van  Loan  employed  the  Levinson-Durbin  algorithm,  it 
was  noted  in  Chapter  5 that  solution  of  (T M_x  — A^I)a  = — t can  be  accomplished 
either  by  a conventional  fast  algorithm,  or  by  a superfast  technique,  since  both  com- 
pute the  same  solution  and  the  same  sequence  of  prediction  errors.  In  addition,  the 
associated  eigenvector  is  easily  obtained  once  a zero  A^  of  E( A)  has  been  found.  The 
eigenvector  is 


” 

■ 

X 

X 

y 

ax 

so  the  normalized  eigenvector  associated  with  Xi  is 


1 

* \/l  + aTa 


1 

a 


If  the  number  of  iterations  necessary  for  A to  converge  to  Ai  is  independent  of  M (as 
is  found  to  be  the  case  in  practice),  then  Algorithm  6.1  is  an  0(M2)  or  0(M  log2  M) 
method,  depending  on  the  choice  of  Toeplitz  solution  algorithm,  for  finding  an  eigen- 
value and  the  associated  eigenvector  of  a symmetric  Toeplitz  matrix. 
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Fast  Computation  of  Any  Eigenvalue  and  Eigenvector 

The  approach  used  by  Cybenko  and  Van  Loan  was  employed  by  Trench  [75,76], 
and  later  by  Noor  and  Morgera  [77],  in  an  0(M2)  method  for  finding  any  eigenvalue 
and  its  associated  eigenvector.  The  Cybenko- Van  Loan  algorithm  may  be  used  with- 
out modification  for  finding  any  eigenvalue;  however,  doing  so  requires  solving  in- 
definite Toeplitz  systems,  and  the  accuracy  of  the  answers  may  therefore  be  affected 
by  the  instability  of  the  solution  algorithms.  Several  approaches  to  resolving  this 
difficulty  are  discussed  in  a later  section. 

The  procedure  described  in  this  chapter  is  a further  development  of  the  ap- 
proach taken  by  Trench,  with  several  improvements  to  enhance  the  efficiency  and 
reliability  of  the  algorithm.  These  include  the  use  of  a superfast  Toeplitz  solver  to 
reduce  the  asymptotic  complexity  of  the  algorithm  to  0(M  log2  M),  the  development 
of  a test  to  detect  whether  the  computed  results  have  been  corrupted  by  numerical 
instability,  and  the  extension  of  the  algorithm  to  solve  the  symmetric  definite  Toeplitz 
generalized  eigenproblem. 

In  the  following  sections,  the  improved  Toeplitz  eigendecomposition  algorithm 
for  the  standard  eigenproblem  TMe  = Ae  will  be  described.  The  algorithm  proceeds 
in  two  stages:  the  desired  eigenvalue  is  first  bracketed,  then  a numerical  root-finding 
procedure  is  used  to  locate  the  eigenvalue  exactly.  A detailed  discussion  of  each  of 
these  stages  is  given  in  the  sections  below. 

Bracketing  the  Desired  Eigenvalue 

It  was  shown  above  that  the  roots  of  E( A)  are  the  eigenvalues  of  TM;  but 
before  a root-finding  procedure  can  be  employed  to  locate  a specific  eigenvalue,  a 
bracketing  interval  must  be  found  that  contains  only  the  desired  root  Aj.  Because 
eigenvalues  of  the  largest  leading  principal  submatrix  TM-1  correspond  to  poles  of  the 
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function  E( A),  it  is  also  desirable  that  the  bracketing  interval  exclude  all  eigenvalues 

°f  Tm_1. 

Brackets  on  the  location  of  a desired  eigenvalue  may  be  easily  determined 
by  the  use  of  the  prediction  error  sequence  resulting  from  solving  the  Yule- Walker 
equation  defined  by  TM  — AI.  This  procedure  is  sometimes  referred  to  as  “slicing” 
the  spectrum  of  TM  [78],  because  each  solution  for  a given  A determines  the  number 
of  eigenvalues  above  and  below  A.  An  illustration  of  the  effect  of  one  “slice”  on 
the  bracketing  intervals  for  each  eigenvalue  is  given  by  Figure  6.1  below,  where  the 
vertical  lines  indicate  the  bracketing  intervals  for  each  eigenvalue. 

Before  After 


Figure  6.1:  Effect  of  “Slicing”  the  Spectrum  at  a Single  Value  of  A on  the 

Eigenvalue  Bracket  Intervals 

A simple  bisection  procedure  is  used  to  compute  a bracketing  interval  for 
the  desired  eigenvalue;  the  number  of  eigenvalues  of  TM  that  are  greater  than  A is 
found  by  solving  the  Yule- Walker  problem  for  TM  — AI,  and  counting  the  number 
of  prediction  errors  greater  than  zero.  Because  the  first  M — 1 prediction  errors 
correspond  to  the  matrix  TM_X  — AI,  the  results  of  the  same  calculation  may  also  be 
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used  to  ensure  that  no  eigenvalue  of  TM-1  is  included  in  the  interval.  The  procedure 
is  summarized  in  Algorithm  6.2. 


Algorithm  6.2:  Eigenvalue  Bracketing 

Determine  initial  brackets  U < X % < Ui 

k\  = M (number  of  eigenvalues  of  T m less  than  Zj) 

mi  = M — 1 (number  eigenvalues  of  Tm-i  less  than  Zj) 

&2  = 0 (number  of  eigenvalues  of  Tm  greater  than  ut) 
m2  = 0 (number  of  eigenvalues  of  T m-  greater  than  U{) 
a — Z(0) 

while  hi  — tt2  > 1 or  mi  ^ m2 
A = (I,  + U{)/2 
t(  0)  = a — A 

Solve  the  Yule- Walker  problem  Tm-i»  — — t 
m = M — 1 — (number  of  negative  e(j)  for  1 < j < M — 1) 
k = M — (number  of  negative  e(j)  for  1 < j < M ) 
if  k < i 
Ui  = A 
&2  = k 
m2  = m 
else 

li  = A 
fci  = k 
mi  = m 

end 

end 


Effects  of  Instability  on  the  Bracketing  Process 

In  light  of  the  fact  that  the  solution  algorithm  used  to  calculate  the  prediction 
errors  is  unstable  when  A is  very  close  to  an  eigenvalue  of  one  of  the  leading  principal 
submatrices  of  TM,  it  is  clear  that  we  must  examine  the  effects  of  instability  on 
the  accuracy  of  the  bracketing  process.  There  are  M(M  — l)/2  eigenvalues  of  the 
leading  principal  submatrices  of  TM,  distributed  between  Xx  and  XM;  each  of  these 
is  surrounded  by  a small  region  where  solution  of  (TM  — AI)a  = — t using  fast  or 
superfast  algorithms  is  unstable. 
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It  should  be  remarked  that  numerical  experiments  have  shown  that  the  prob- 
ability of  an  evaluation  point  falling  within  one  of  these  unstable  regions  is  extremely 
small,  at  least  for  the  types  of  matrices  encountered  in  subspace  estimation.  During 
the  computation  of  hundreds  of  thousands  of  eigenvalues  and  eigenvectors  of  randomly 
generated  autocorrelation  matrices,  with  dimensions  ranging  from  17  to  16385,  failure 
of  the  bracketing  algorithm  has  never  occurred.  Nevertheless,  it  is  clearly  desirable  to 
have  a test  for  instability  to  avoid  any  possibility  of  error  in  the  bracketing  process, 
and  it  is  essential  that  the  test  be  efficient. 

In  fact,  several  efficient  tests  do  exist.  The  simplest  of  these  forms  the  first  line 
of  defense  against  errors  in  the  bracketing  process;  it  is  based  on  consistency  checking 
of  the  results  of  the  bracket  computations.  Suppose  the  slicing  procedure  (solution  of 
the  Yule- Walker  problem  defined  by  TM  — AI)  is  carried  out  at  three  different  values 
of  A,  \a  > \b  > Ac,  with  resulting  prediction  error  sequences  Ea,  Eb,  and  Ec.  If  we 
define  Nneg(E(l:k))  to  be  the  number  of  negative  entries  among  the  first  k elements 
of  E , then  the  interlacing  property  requires  that  Nneg(Ea(l : k))  < N (Eb(l:k))  < 
Nneg{Ec(l : k))  for  any  k between  0 and  M.  If  E is  stored  when  each  bracket  point 
is  calculated,  it  is  simple  to  test  the  consistency  of  the  results  of  each  evaluation. 
In  practice,  storage  of  the  full  prediction  error  vector  for  each  bracket  point  is  often 
impractical,  since  this  requires  2 M2  memory  locations.  However,  to  compute  the 
desired  bracket  interval,  the  values  Nneg(E(  1 : M — 1))  and  Nneg(E(  1 : M ))  must  be 
stored  anyway,  and  these  can  form  the  basis  for  a useful  test  that  requires  almost  no 
additional  computation. 

A second  type  of  test  attempts  to  estimate  the  condition  of  each  problem  as 
it  is  solved.  As  discussed  earlier,  any  computational  method  that  divides  a problem 
into  smaller  pieces  faces  the  possibility  that,  although  the  problem  as  a whole  is  well 
conditioned,  the  pieces  may  not  be.  This  possibility  exists  for  both  the  fast  and 
superfast  algorithms;  only  the  way  the  problem  is  divided  is  different.  A test  that 
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determines  the  condition  of  each  subproblem  can  detect  when  the  results  of  a bracket 
computation  may  be  in  error. 

In  the  fast  algorithms,  the  Levinson-Durbin  algorithm  is  used  to  compute  the 
and  it  is  unstable  only  if  one  of  the  leading  principal  submatrices  of  T M is  ill 
conditioned  [65].  If  TM  is  a Toeplitz  matrix,  the  largest  leading  principal  submatrix 
Tm_j  is  nonsingular,  and 
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with  Em_x  ^ 0,  then  Theorem  5.1  shows  that 
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Using  this  and  the  facts  that  ||L|| t = ||LT||1  = 1 + Ha^  and  HM^  = ||Mr||1  = Ha^, 
it  is  simple  to  bound  ||T]^ ||x: 
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The  1-norm  condition  of  TM  is  therefore  bounded  by  1 


K\  (IV)  — 


|Tm||i(1  + 2 1 1 a|  | x + 2j|a||^) 


I E 


M-ll 


(6.5) 


and  a very  similar  derivation  can  be  used  to  show  that  this  bound  also  applies  to 
Koo (Tm).  Because  HT^Hj  is  easily  computed,  this  provides  a simple  test  for  ill- 
conditioning  of  the  subproblems  solved  by  the  conventional  fast  algorithms;  a notable 
advantage  of  this  approach  is  that  the  additional  computation  required  is  negligible, 
and  may  be  performed  during  the  solution  of  each  subproblem. 

For  the  superfast  generalized  Schur  algorithms,  the  problem  is  recursively  sub- 
divided into  smaller  subproblems,  each  of  which  is  solved  by  the  same  algorithm. 
The  generalized  Schur  algorithm  for  solving  the  Yule- Walker  problem  is  unstable 
only  if  one  of  the  subproblems  is  ill  conditioned.  As  in  the  case  of  the  Levinson- 
Durbin  algorithm,  ill-conditioning  of  a subproblem  for  the  Schur  algorithm  is  of- 
ten indicated  by  small  prediction  errors  and  a large  solution  vector.  The  quantity 
||Tm||(1  + 2MI,  -I-  2||a||^)/|F'M_1|  may  be  still  used  as  an  estimator  of  the  stability 
of  the  solution,  although  it  is  no  longer  a rigorous  bound  for  the  condition  number. 

By  examining  the  prediction  errors  and  solution  vectors  returned  by  the  so- 
lution algorithm,  it  is  possible  to  detect  ill-conditioning  of  a subproblem.  Since  the 
bracketing  process  does  not  require  specific  values  of  A,  but  only  values  that  decrease 
the  bracketing  interval  significantly,  the  simplest  solution  to  instability  during  the 
bracketing  phase  is  to  discard  the  results  of  a poorly-conditioned  evaluation,  and 
choose  a new  A displaced  from  the  original  value  by  a small  fraction  of  the  width  of 
the  bracketing  interval. 

1Both  Cybenko  [65]  and  Koltracht  and  Lancaster  [70]  have  developed  somewhat  tighter  bounds 
on  ||TAf  ||  than  those  given  here,  for  the  case  when  T m is  positive  definite.  In  particular,  Koltracht 
and  Lancaster  show  that  the  term  (l  + 2||a||i+2||a||f)  in  equation  (6.5)  may  be  replaced  by  (1  + ||a||f) 
if  T m is  positive  definite.  They  also  show  that  this  is  a tighter  bound  than  those  given  by  Cybenko. 
However,  the  bound  given  by  (6.5)  is  only  a factor  of  two  looser,  and  applies  to  indefinite  Toeplitz 
matrices  as  well. 
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The  tests  described  above,  although  they  are  not  designed  to  totally  exclude 
the  possibility  of  an  error  in  bracketing,  greatly  reduce  the  possibility  that  an  er- 
roneous bracket  interval  will  occur.  If,  despite  these  safeguards,  an  error  occurs,  it 
will  be  discovered  when  the  root-location  phase  fails  to  converge  to  an  eigenvalue, 
and  only  the  amount  of  computation  necessary  to  locate  a single  eigenvalue  will  have 
been  wasted.  These  tests  do  not  address  the  somewhat  more  difficult  problem  of 
determining  the  accuracy  of  the  eigenvalues  and  eigenvectors  computed  during  the 
second  phase;  rigorous  tests  for  the  accuracy  of  these  computations  are  described  in 
the  next  chapter. 

Enhancing  the  Efficiency  of  the  Bracketing  Stage 

Use  of  bisection  to  bracket  the  desired  eigenvalue  is  a common  feature  of  all 
earlier  Toeplitz  eigensolvers.  In  addition  to  stability  verification  techniques  for  the 
bracketing  phase,  two  further  enhancements  have  been  developed  in  this  work  to 
improve  its  efficiency  and  accuracy.  First,  to  reduce  the  number  of  Yule-Walker 
solutions  needed  to  isolate  a root,  an  initial  bracketing  interval  is  constructed  by 
embedding  the  MxM  Toeplitz  matrix  TM  in  a circulant  matrix  C.  The  eigenvalues  of 
a circulant  matrix  are  given  by  the  Fourier  transform  of  its  first  column,  so  they  can  be 
computed  in  0(M  log  M)  operations;  then,  since  TM  is  a leading  principal  submatrix 
of  C,  the  interlacing  property  may  be  used  to  compute  upper  and  lower  limits  for 
each  eigenvalue  of  TM.  (This  approach  was  suggested  by  Feyh  and  Mullis  [79],  in 
the  context  of  a method  for  solving  the  Toeplitz  inverse  eigenvalue  problem,  that 
is,  finding  a Toeplitz  matrix  with  a given  set  of  eigenvalues.)  These  results  are 
used  to  initialize  a table  of  brackets  for  each  eigenvalue.  Bounds  for  the  minimum 
and  maximum  eigenvalue  of  TM  are  also  computed  using  the  procedure  given  by 
Hertz  [80].  These  computations  are  performed  each  time  a new  matrix  is  input  for 
solution. 
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Second,  each  time  the  Yule- Walker  problem  is  solved  for  a specific  value  of  A, 
the  bracket  table  is  updated.  Because  the  bisection  process  to  isolate  an  eigenvalue  \ 
solves  the  Yule- Walker  problem  for  many  nearby  values  of  A,  maintaining  a bracket 
table  dramatically  reduces  the  amount  of  computation  required  for  locating  a group  of 
adjacent  eigenvalues.  In  addition  to  the  upper  and  lower  bounds  for  the  eigenvalues, 
the  values  of  E( A)  and  E'( A)  at  the  bounds  are  also  stored,  so  that  they  may  be 
provided  as  input  to  the  root-finding  routine;  this  saves  two  additional  Yule- Walker 
solutions  per  root. 


Locating  the  Roots  of  E( A) 

Once  a root  of  E( A)  has  been  bracketed,  the  second  stage  of  the  computation  is 
to  locate  the  root  precisely,  using  a numerical  root-finding  procedure  to  determine  its 
location.  Many  techniques  have  been  employed  for  this  purpose:  Cybenko  and  Van 
Loan  [71]  employed  Newton’s  method,  Hu  and  Kung  [73],  Rayleigh  quotient  iteration, 
Trench  [75],  the  Pegasus  method  (a  variant  of  the  secant  method  [81,  p.  341]),  and 
Noor  and  Morgera  [77],  a modified  form  of  Rayleigh  quotient  iteration. 

To  maximize  the  efficiency  of  the  computation,  it  is  clearly  necessary  to  min- 
imize the  number  of  Yule- Walker  solutions.  The  number  of  solutions  required  is 
determined  by  the  root-finding  technique  employed;  one  measure  of  the  rapidity  with 
which  a root  may  be  located  is  the  asymptotic  convergence  rate.  A convergence  rate 
of  p means  that  the  error  in  the  root  location  decreases  asymptotically  as  Kp,  where 
K is  the  number  of  evaluations  of  the  function  (that  is,  the  number  of  solutions  of  the 
Yule- Walker  equation).  The  asymptotic  convergence  rate  for  each  of  the  techniques 
mentioned  above  is  shown  in  Table  6.1. 
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Table  6.1:  Asymptotic  Convergence  Rates  of  Root-Finding  Algorithms 
algorithm  convergence  rate 


bisection  method 

1.000 

Pegasus  method 

1.642 

Newton’s  method 

2.000 

rational  interpolation  (degree  = 3) 

2.000 

Rayleigh  quotient  iteration 

3.000 

The  fastest  techniques  above  have  extremely  rapid  convergence  rates;  for  ex- 
ample, p — 3 implies  that  once  asymptotic  convergence  has  been  reached,  if  a root 
location  accurate  to  2 significant  digits  has  been  found,  the  next  solution  will  deter- 
mine the  root  location  with  6 digits  of  accuracy,  and  the  following  evaluation  will 
produce  a location  accurate  to  18  significant  digits.  However,  evaluating  a technique 
solely  on  the  basis  of  its  asymptotic  convergence  can  be  misleading:  a technique  with 
p — 2 that  requires  2 fewer  function  evaluations  to  reach  asymptotic  convergence  and 
locate  the  root  to  2 significant  digits  will  reach  the  accuracy  limit  of  double  precision 
arithmetic  (17  digits)  faster  than  the  p — 3 technique.  The  ideal  root-finding  method 
for  this  application  would  be  efficient  in  the  initial  location  of  the  root,  and  also  have 
a high  asymptotic  convergence  rate.  We  will  discuss  the  performance  of  the  most 
effective  techniques  in  the  sections  below. 

Rayleigh  Quotient  Iteration 

Unlike  the  other  techniques  mentioned  above,  Rayleigh  quotient  iteration  is 
not  a general  technique  for  finding  roots  of  functions;  it  is  specifically  an  eigenvalue 
location  technique.  It  also  requires  the  solution  of  a general  Toeplitz  system,  rather 
than  the  Yule- Walker  system  required  by  the  other  approaches.  Although  the  ad- 
ditional computation  required  can  be  minimized  using  the  Gohberg-Semencul  rela- 
tion, the  computational  burden  per  function  evaluation  is  still  higher  than  for  other 
techniques.  Despite  these  disadvantages,  the  cubic  convergence  rate  of  the  Rayleigh 
quotient  approach  makes  it  an  attractive  candidate  for  eigenvalue  location,  and  its 
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use  has  been  proposed  by  several  researchers  [73,77].  The  basic  iteration  is  given  by 
Algorithm  6.3. 

Algorithm  6.3:  Rayleigh  Quotient  Iteration 

Find  initial  estimates  A^0)  and  e*0) 

s = l 

while  \/s  > e 

Solve  (T M ~ A<fc_1)l)x  = et*”1) 
s = xTx 

^(fc)  _ xtT mx/s 
e(*:)  = x/y/s 

end 

For  almost  any  choice  of  initial  values  A^  and  Rayleigh  quotient  iteration 
will  converge  to  a pair  (A,  e)  composed  of  an  eigenvalue  of  TM  and  the  associated 
eigenvector,  and  the  convergence  will  ultimately  be  cubic  [82].  Unfortunately,  there  is 
no  way  to  be  certain  that  it  will  converge  to  the  desired  eigenvalue,  unless  it  is  known 
that  A*0'  and  e(0)  are  closer  to  the  desired  pair  than  to  any  other.  By  combining 
bisection  with  Rayleigh  quotient  iteration,  Noor  and  Morgera  [77]  produced  a simple 
bracketing  algorithm  that  locates  a particular  eigenvalue  and  eigenvector  with  the 
same  asymptotic  convergence  rate  as  Rayleigh  quotient  iteration. 
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Algorithm  6.4:  Modified  Rayleigh  Quotient  Iteration 


Find  initial  bounds  lj  < A j < Uj 
Find  initial  estimates  A^  and 
s = 1 


while  min(uj  — lj,  1/ y/s)  > e 
Solve  (T M - _1)I)x  = ef~1] 

s = x7x 
g = xtTmx/s 
if  lj  < q < Uj 


A! [fc)  = g 


Jfc)  _ 


j 

else 


end 
end 


Update  bounds  lj  and  Uj  using  the  prediction  error  sequence 
Ajk)  = (lj  + uj)/2 

Jk)  _ (*- 1) 

eJ  _ eJ 


Noor  and  Morgera  have  shown  empirically  that  this  approach  outperforms 
Newton’s  method  and  the  Pegasus  algorithm  when  used  in  a fast  Toeplitz  eigensolver 
[77].  The  major  disadvantage  of  the  modified  Rayleigh  quotient  iteration  is  its  slow 
initial  convergence:  because  it  employs  bisection,  it  locates  the  neighborhood  of  the 
root  with  only  linear  convergence,  and  then  abruptly  transitions  to  cubic  convergence 
once  the  Rayleigh  quotient  q falls  within  the  bounds. 

Rational  Polynomial  Interpolation 

A technique  based  on  rational  polynomial  interpolation  is  an  alternative  ap- 
proach to  location  of  roots;  it  has  asymptotic  convergence  nearly  as  rapid  as  Rayleigh 
quotient  iteration,  and  better  initial  convergence  properties.  The  high  performance 
of  this  approach  is  a consequence  of  the  fact  that  the  model  it  assumes  for  E{ A)  is 
well  matched  to  the  actual  form. 
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Almost  all  root  location  techniques  with  superlinear  convergence  work  by  using 
the  results  of  function  evaluations  to  form  a model  of  the  function  in  the  neighborhood 
of  a root,  and  using  that  model  to  select  the  next  point  at  which  to  evaluate  the 
function;  for  example,  Newton’s  method  models  the  function  as  a line  tangent  to  the 
function  at  the  most  recent  evaluation  point.  By  matching  the  model  as  closely  as 
possible  to  the  actual  form  of  the  function,  excellent  performance  can  be  achieved. 
For  locating  the  roots  of  E(\),  a rational  polynomial  is  clearly  an  appropriate  model, 
since  .E'(A)  = det(TM  — AI)/  det(TM_x  — AI). 

A rational  polynomial  approximation  of  a function  is  characterized  by  the  de- 
grees of  the  numerator  and  denominator  polynomials;  the  possibilities  include  simple 
polynomial  approximation  (degree  of  the  denominator  is  zero),  inverse  polynomial 
approximation  (degree  of  the  numerator  is  zero),  and  the  case  where  the  degrees  of 
the  numerator  and  denominator  are  equal.  For  approximating  jE7( A) , where  the  in- 
terlacing of  the  eigenvalues  of  TM  and  TM_X  guarantees  that  there  is  a pole  of  E( A) 
between  any  two  roots,  an  approximation  with  numerator  and  denominator  of  nearly 
equal  degree  is  the  best  choice,  since  within  any  interval  the  difference  between  the 
number  of  poles  and  the  number  of  zeros  of  E( A)  is  at  most  one. 

The  simplest  function  of  this  form  is  given  by 

_ K (A  ~ z) 
fn  K(X  -p,)(A-p2)' 

This  approximating  function  has  four  unknown  coefficients;  by  using  the  values  of 
E{ A)  and  E'{ A)  at  the  upper  and  lower  bracket  points,  these  coefficients  may  be 
determined.  The  next  function  evaluation  point  is  then  given  by  the  zero  z of  the  ap- 
proximating function.  Efficient  methods  for  performing  this  computation  have  been 
developed  by  Larkin  [83],  and  further  improved  by  Norton  [84];  Norton’s  implemen- 
tation has  been  used  here  to  locate  the  roots  of  E( A). 
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Comparison 

In  numerical  tests  with  an  eigenvalue  location  tolerance  e = 10“12,  using 
17-digit  arithmetic  (IEEE  754  double  precision),  the  rational  function  interpolation 
method  of  locating  the  eigenvalues  was  found  to  be  somewhat  more  efficient  than 
the  modified  Rayleigh  quotient  iteration  used  by  Noor  and  Morgera,  requiring  on  the 
average  20-30%  fewer  evaluations  of  E( A).  It  was  found  that  the  rational  function 
method  achieved  faster  than  linear  convergence  almost  immediately  (within  2-3  func- 
tion evaluations),  while  the  modified  Rayleigh  quotient  iteration  typically  required 
4-6  evaluations  before  the  Rayleigh  quotient  fell  within  the  bounds  and  superlinear 
convergence  began. 

The  relative  performance  of  the  two  methods  depends  on  the  location  of  the  de- 
sired eigenvalue.  Rayleigh  quotient  iteration  was  less  effective  when  locating  an  eigen- 
value within  a closely  spaced  group,  and  more  effective  when  the  desired  eigenvalue 
was  isolated,  while  the  performance  of  rational  interpolation  was  less  sensitive  to  the 
location  of  nearby  eigenvalues.  The  performance  of  the  two  techniques  also  depends 
on  the  precision  e with  which  eigenvalue  locations  were  computed.  Since  Rayleigh 
quotient  iteration  has  faster  asymptotic  convergence,  its  performance  compared  to 
rational  function  interpolation  would  improve  for  smaller  tolerances,  were  it  not  for 
the  lower  limit  set  by  the  precision  of  the  floating-point  format  (e  « 2.2  x 10~16); 
conversely,  rational  function  interpolation  showed  a larger  performance  advantage  for 
larger  e.  On  a machine  with  higher  precision  arithmetic,  for  example,  Cray  128-bit 
double  precision,  Rayleigh  quotient  iteration  would  be  the  method  of  choice  for  small 
6. 

For  the  eigenvector  computations  described  in  the  remainder  of  this  work, 
rational  function  interpolation  with  e = 10~12  was  employed.  This  was  followed  by  a 
single  step  of  inverse  “iteration,”  that  is,  the  eigenvalue  and  eigenvector  estimates 
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and  e(°)  resulting  from  the  rational  function  solution  were  used  to  define  the  equation 
TMy  = and  the  result  e = y/||y||  was  taken  as  the  eigenvector  estimate. 

The  Chan-Hansen  Algorithm:  A Stable  Method 

In  the  following  chapter,  we  will  discuss  tests  for  verifying  that  the  eigenvalues 
and  eigenvectors  computed  by  these  fast  techniques  are  correct.  To  provide  an  efficient 
means  of  recalculating  any  that  are  found  to  be  in  error,  it  is  necessary  to  have  a stable 
fast  technique  for  solving  the  Yule- Walker  problem.  An  algorithm  recently  developed 
by  Chan  and  Hansen  [68]  answers  this  need  (in  fact,  the  possibility  of  using  this 
algorithm  for  the  Toeplitz  eigenvalue  problem  is  mentioned  by  Chan  and  Hansen). 
The  algorithm  solves  general  Toeplitz  systems  with  an  asymptotic  complexity  of  4 M2 
operations,  making  it  roughly  twice  as  complex  as  the  Levinson-Durbin  algorithm, 
and  the  same  complexity  as  Trench’s  algorithm  for  solving  general  Toeplitz  systems. 
The  fact  that  this  algorithm  solves  general  Toeplitz  systems  makes  it  possible  to 
employ  it  for  either  the  rational  polynomial,  Rayleigh  quotient,  or  inverse  iteration 
methods.  It  is  of  course  possible  to  employ  this  algorithm  exclusively  during  the 
eigenvalue  and  eigenvector  computations,  removing  the  need  for  testing,  but  it  has 
been  found  that  the  rarity  of  errors  makes  it  more  efficient  to  use  the  faster  algorithms 
and  test  the  results. 

The  algorithm  developed  by  Chan  and  Hansen,  called  the  look-ahead  Levin- 
son algorithm,  is  a modified  form  of  Trench’s  algorithm  that  estimates  the  condition 
number  of  each  leading  principal  submatrix  before  attempting  to  solve  it,  and  incor- 
porates a LU  decomposition  routine  that  allows  it  to  step  over  one  or  more  nearly 
singular  submatrices.  The  algorithm  is  stable  for  any  Toeplitz  matrix  that  does  not 
have  several  consecutive  nearly  singular  leading  principal  submatrices,  and  provides 
both  an  estimate  of  the  condition  number  of  the  matrix  and  an  accuracy  estimate  for 
the  computed  solution. 


132 


Although  the  LU  decomposition  routine  is  0(P3),  where  P is  the  maximum 
number  of  consecutive  ill-conditioned  leading  principal  submatrices,  numerical  tests 
[68, 85]  on  random  matrices  indicate  that  very  few  LU  steps  are  required,  and  that 
the  typical  overhead  on  matrices  of  order  1000-2000  is  approximately  15%,  compared 
to  the  standard  Trench  algorithm.  The  accuracy  of  the  results  for  indefinite  matrices 
is  also  reported  to  be  better  than  that  obtained  from  Trench’s  algorithm.  The  Chan- 
Hansen  algorithm  is  thus  capable  of  solving  accurately  almost  all  of  the  problems 
that  cannot  be  accurately  solved  by  the  fast  and  superfast  algorithms,  and  provides 
a means  of  recomputing  those  eigenvalues  and  eigenvectors  found  to  be  in  error  by 
the  tests  described  in  the  next  chapter. 

The  Generalized  Eigenproblem 

The  extension  of  the  fast  Toeplitz  eigensolvers  to  the  generalized  Toeplitz 
eigenproblem  TMe-A£Me  is  straightforward,  since  the  techniques  used  for  eigenvalue 
bracketing  and  location  generalize  with  only  slight  modifications.  The  generalized 
eigenvalues  are  the  roots  of  the  equation  det(A£M  — TM)  = 0,  so,  just  as  in  the 
standard  problem,  we  must  locate  the  zeros  of  the  prediction  error  function  E( A). 

The  bracketing  phase  relies  on  the  Sylvester  inertia  principle,  which  extends  to 
the  generalized  eigenproblem  without  difficulty  [82,  p.  46].  As  we  have  seen  above,  the 
efficiency  of  the  location  phase  is  greatly  improved  if  the  derivative  E'( A)  is  available. 
The  Yule- Walker  problem  to  be  solved  is 


tQ  A(70 

tT  - AsT 

1 

E(\) 

t — As 

I'm- i - ^M-l 

a 

0 

Separating  the  equations  yields 


t0  — A a0  + (tT  - Asr)a  = E{ A) 
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t - As  + (Tm_1  - AEM_x)a  — 0. 


Taking  the  derivatives  of  these  equations,  we  find  that 

E'{  A)  = — cr0  — sTa  + (tr  — AsT)a', 

and 

a = (^m-i  _ ^m-i)  1(s  + EM_1a). 

Using  the  fact  that  (tr  - Ast)(Tm_1  — AEM_X)_1  = -aT,  we  have 


E (X)  — a0  sTa  + (tr  Ast)(Tm_1  AEm-1)  (s  + SM_xa) 
= -(t0  - sTa  - aT(s  + SM_xa). 


The  derivative  can  be  written  as 


E'(  A) 


On  S' 


1 a7 


s E 


M- 1 


1 

a 


and  consequently  can  be  computed  using  a fast  algorithm  in  0(M  log  M)  operations. 
Because  EM  is  positive  definite,  E'( A)  is  always  negative,  just  as  in  the  standard  case. 

The  initial  eigenvalue  bounds  for  the  standard  eigenproblem  were  found  by 
embedding  the  Toeplitz  matrix  in  a circulant  matrix,  finding  the  eigenvalues  of  the 
circulant  matrix  using  the  FFT,  and  using  the  interlacing  property  to  bound  the 
eigenvalues  of  the  original  Toeplitz  matrix.  The  interlacing  property  also  holds  for 
definite  matrix  pencils  [86,  p.  340],  and  so  exactly  the  same  technique  may  be  applied 
to  find  initial  bounds  for  the  generalized  eigenvalues. 
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As  before,  either  Rayleigh  quotient  iteration  or  inverse  iteration  can  be  used  to 
improve  the  accuracy  of  a generalized  eigenvalue  and  eigenvector  pair.  Algorithm  6.5 
shows  how  the  iteration  process  extends  to  the  generalized  eigenvalue  problem. 


Algorithm  6.5:  Generalized  Rayleigh  Quotient  Iteration 

Find  initial  estimates  and  e*°* 
s = 1 

while  1/s  > e 

Solve  (TV  - A(fc-1)SM)x  = 
s = xtSmx 
\(k)  = xtT  mx/s 
e(*)  = x/\/s 

end 


One  additional  caution  concerning  the  generalized  eigenvalue  problem  is  ap- 
propriate: since  the  generalized  eigenvectors  are  not  orthogonal  in  the  usual  sense, 
computational  shortcuts  based  on  equation  (3.18)  are  no  longer  applicable.  This  can 
lead  to  increased  computation  for  noise  subspaces  methods  when  M N. 

In  this  chapter,  we  have  seen  that  the  existence  of  fast  algorithms  for  the 
solution  of  Toeplitz  systems  makes  it  possible  to  solve  both  the  standard  eigenproblem 
and  the  definite  generalized  eigenproblem  for  symmetric  Toeplitz  matrices  or  matrix 
pencils  efficiently.  A procedure  has  been  presented  for  ensuring  that  the  initial  bounds 
computed  for  the  eigenvalues  are  reliable  despite  the  possible  instability  of  the  Toeplitz 
solution  algorithms.  In  the  next  chapter,  tests  will  be  developed  for  verifying  that 
instability  has  not  caused  errors  in  the  eigenvalues  and  eigenvectors  themselves. 


CHAPTER  7 

VERIFYING  THE  ACCURACY  OF  RESULTS 


The  preceding  chapter  describes  a method  for  efficiently  computing  eigenvalues 
and  eigenvectors  of  Toeplitz  matrices  that  is  based  on  fast  algorithms  for  solution 
of  the  Yule- Walker  problem.  Although  the  algorithms  used  cannot  be  guaranteed 
stable,  errors  in  the  computed  eigenvalues  and  eigenvectors  due  to  instability  in  the 
solution  of  the  Yule- Walker  problem  are  rare.  When  these  algorithms  are  used  to 
solve  a system  with  an  ill-conditioned  subproblem,  for  which  they  are  unstable,  the 
ill-conditioning  of  the  subproblem  in  question  will  generally  be  revealed  by  a small 
value  of  the  prediction  error  or  by  a solution  vector  that  has  a large  norm. 

Because  the  systems  solved  during  the  bracketing  phase  are  rarely  close  to 
singular,  and  because  a different  evaluation  point  may  be  chosen  if  a particular  A 
is  found  to  cause  instability,  it  is  simple  to  avoid  regions  of  instability  during  the 
bracketing  phase.  In  practice,  it  has  been  found  that  errors  in  the  second  phase, 
computation  of  the  desired  eigenvalue  and  eigenvector,  are  also  very  rare.  However, 
because  the  process  of  computing  the  eigenvalues  deliberately  causes  E( A)  to  become 
as  small  as  possible,  and  ||a||  to  be  as  large  as  possible,  the  techniques  described  in 
the  previous  chapter  for  estimating  subproblem  condition  during  the  bracketing  phase 
are  not  useful  during  the  eigenvector  computation  phase.  The  consistency  tests  can 
still  be  used,  of  course,  and  when  an  eigenvalue  is  being  located  exactly,  it  is  feasible 
to  store  the  entire  prediction  error  vectors  for  the  upper  and  lower  bounds.  These 
tests  do  not  provide  an  indication  of  the  accuracy  of  the  computed  results,  however; 
tests  for  this  purpose  are  described  below. 


135 


136 


An  Extremely  Simple  Test 

Before  continuing  to  the  more  complicated  tests  that  form  the  central  focus  of 
this  chapter,  it  is  worthwhile  to  mention  a very  simple  test  that  is  capable  of  detecting 
gross  errors  in  the  computed  eigenvectors.  It  is  well  known  that  the  eigenvectors  of 
a symmetric  M x M Toeplitz  matrix  are  either  symmetric  (e(z)  = e(M  — i + 1)),  or 
antisymmetric  (e(z)  = — e(M  — i + 1))  [87].  If  an  estimated  eigenvector  fails  this  test, 
then  it  is  obviously  erroneous,  and  the  more  sophisticated  tests  described  below  are 
unnecessary.  However,  the  fact  that  an  eigenvector  is  symmetric  or  antisymmetric 
to  within  a given  tolerance  is  not  sufficient  to  establish  that  it  is  accurate,  nor  can 
meaningful  bounds  on  the  accuracy  of  the  subspaces  defined  by  a set  of  eigenvectors 
be  determined  solely  on  the  basis  of  symmetry  tests. 

Many  other  properties  of  the  symmetric  and  antisymmetric  eigenvectors  of 
Toeplitz  matrices  are  known,  but  most  are  of  little  assistance  in  verifying  results.  It 
is  known,  for  example,  that  any  M x M Toeplitz  matrix  has  [M/2]  symmetric  and 
[M/2J  antisymmetric  eigenvectors  [87].  For  some  special  cases,  such  as  tridiagonal 
Toeplitz  matrices,  and  Toeplitz  matrices  which  are  the  autocorrelation  matrices  of 
stationary  random  processes  with  decreasing  power  spectra  (that  is,  R(i,j)  = R(\i  — 
j|)  is  the  Fourier  transform  of  a function  S(u)  such  that  S,(o;1)  > 5(u;2)  if  uj1  < u2), 
it  has  been  shown  that  the  symmetric  and  antisymmetric  eigenvectors  appear  in 
alternating  order  [88].  Although  this  particular  property  does  not  apply  in  general, 
the  question  of  whether  some  form  of  ordering  relation  applies  for  the  symmetric  and 
antisymmetric  eigenvectors  of  all  Toeplitz  matrices  is  still  open. 

Error  Detection 

In  this  chapter,  simple,  efficient,  and  reliable  tests  for  the  accuracy  of  the 
computed  eigenvalues  and  eigenvectors  will  be  described.  Tests  are  developed  both  for 
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a single  eigenvalue-eigenvector  pair,  and  for  a set  of  eigenvalues  and  their  associated 
eigenvectors,  which  define  an  invariant  subspace.  In  the  current  application,  we  are 
concerned  with  the  signal  and  noise  subspaces,  that  is,  subspaces  that  are  defined  by 
contiguous  sets  of  eigenvalues,  so  the  tests  for  subspaces  will  be  specialized  to  that 
case. 

In  order  to  take  advantage  of  the  fast  algorithms  for  Toeplitz  matrix-vector 
multiplication,  the  accuracy  tests  will  be  carried  out  using  a residual  matrix.  To  illus- 
trate the  use  of  a residual,  consider  the  case  when  the  computed  invariant  subspaces 
and  eigenvalues  are  exact:  if  R is  a M x M real  symmetric  matrix,  the  N orthonormal 
columns  of  E define  an  invariant  subspace  of  R,  and  L is  a N x N diagonal  matrix 
whose  entries  are  the  eigenvalues  associated  with  the  columns  of  E,  then 


RE  - EL  = 0. 


Now,  suppose  that  E and  L are  not  known  exactly,  but  that  we  have  computed  an 
approximate  E and  L.  If  we  define  a residual  matrix  D as1 


RE  - EL  = D, 


then  the  size  of  D,  measured  using  an  appropriate  matrix  norm,  can  be  used  to 
determine  the  accuracy  of  E and  L.  Notice  that  if  R is  Toeplitz,  the  product  RE  can 
be  computed  using  the  fast  Fourier  transform.  Since  N is  usually  much  smaller  than 
M,  we  are  usually  less  concerned  with  the  efficiency  of  computing  the  other  product 
EL,  but  if  N is  large  enough  that  this  is  important,  it  is  possible  to  choose  L so  that 
this  computation  is  very  efficient  as  well;  these  issues  will  be  discussed  in  more  detail 
below. 

1In  the  usual  notation,  the  residual  is  denoted  by  R;  to  avoid  confusion  with  the  autocorrelation 
matrix,  we  will  use  D to  denote  the  residual  here. 
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To  interpret  the  results  of  this  computation,  it  is  necessary  to  relate  the  size  of 
the  residual  to  the  accuracy  of  the  approximate  eigenvalues  and  the  subspace.  This 
connection  is  provided  by  a more  complete  form  of  the  Sin  0 theorem  first  described 
in  Chapter  3 [40]. 

Theorem  7.1  (Sin  0 Theorem)  Let  R be  an  M x M real  symmetric  matrix,  and 
express  the  eigendecomposition  of  R as 


E-  — [Es,  diag(A1?  ■ • • > > ^n+ u • • • > ^m)[Es>  E/v], 

where  Ax  > A2  > . . . > XM,  the  columns  of  the  M x N matrix  Es  span  the  exact 
signal  subspace  S,  and  the  columns  of  the  M x M — N matrix  span  the  exact 

noise  subspace  J\f  ■ 

In  addition,  let  R = R + H also  be  a symmetric  matrix,  and  express  its 
eigendecomposition  as 


R — [Es,  E^]7  diag(A1; . . . , XN,  . . . , AM)[ES,  E^], 

where  A:  > A2  > . . . > \M,  the  N columns  of  Es  span  the  approximate  signal  subspace 
S,  and  the  M — N columns  of  ~EN  span  the  approximate  noise  subspace  J\f. 

Finally,  define  the  matrix  residuals  Ds  and  Dw  as 

Ds  = RES  — ESLS, 

Dn  = REW  — HjnLn, 

where  Ls  and  LN  are  symmetric  matrices,  the  eigenvalues  of  Ls  are  Xl, . . . , XN,  and 
the  eigenvalues  of  LN  are  XN+l, . . . , XM. 
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Then: 


and 


sin 0(5, 5)||  < 


^N+ 1 


sin0(jV’,Ar)||  < 


^■N  ^N+ 1 


for  any  unitarily  invariant  norm. 

The  Sin  0 theorem  allows  the  accuracy  of  a calculated  subspace  to  be  verified 
by  examining  the  residual  matrix  D.  The  residual  can  be  calculated  for  an  invariant 
subspace  of  any  dimension,  and  provides  an  efficient  means  of  testing  the  accuracy  of 
complete  signal  or  noise  subspaces;  later  in  this  chapter,  we  will  also  discuss  a form 
of  this  theorem  that  may  be  used  to  test  individual  eigenvalues  and  eigenvectors. 

In  order  to  bound  the  error  in  the  subspace  estimate,  it  is  necessary  to  compute 
D,  compute  exactly  or  find  an  upper  bound  for  a unitarily  invariant  norm  ||D||,  and 
compute  or  find  a lower  bound  for  the  gap  between  the  eigenvalues  associated  with 
each  of  the  subspaces.  For  this  computation  to  be  practical  in  the  context  of  a 
fast  eigendecomposition  routine,  it  is  of  course  essential  that  these  computations  be 
performed  as  efficiently  as  possible.  We  will  consider  each  of  these  steps  in  turn  in 
the  following  sections. 


Computation  of  the  Residual 

Both  of  the  residuals  are  the  result  of  a computation  of  the  form  RE  — EL; 
since  the  computations  are  similar  for  both  residuals,  we  will  use  a general  notation  in 
discussing  the  problem.  The  matrix  R is  M x M,  and  the  matrix  E is  either  M x N, 
for  the  signal  subspace,  or  M x M — N,  for  the  noise  subspace;  here,  the  number 
of  columns  in  E will  be  denoted  by  NE.  To  compute  D without  taking  advantage 
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of  any  special  structure,  2M2NE  operations  are  required  to  compute  RE,  2MNE  for 
EL,  and  MNE  to  subtract  the  two  resulting  matrices. 

Since  the  computation  of  RE  by  simple  matrix  multiplication  requires  0(M 2) 
operations,  it  is  clear  that  we  must  perform  this  step  more  efficiently,  particularly  if 
it  is  to  be  used  with  the  superfast  methods;  otherwise,  checking  the  results  of  the 
computation  will  have  a higher  asymptotic  complexity  than  the  computation  itself. 
We  have  already  seen  in  Chapter  5 that  multiplication  of  a vector  by  a Toeplitz  matrix 
can  be  performed  very  efficiently  using  Algorithm  5.5.  Computation  of  the  product 
RE  will  require  one  forward  FFT  of  length  at  least  2M—1  to  transform  R,  NE  forward 
FFTs  of  the  same  length  for  the  columns  of  E,  MNE  complex  multiplications,  and 
Ne  inverse  FFTs,  for  a total  of  2NE  + 1 FFTs  of  length  at  least  2 M — 1. 

Using  a split-radix  FFT  algorithm,  the  FFT  of  a real  sequence  of  length  K = 
2k  may  be  computed  in  2 K log2  K — 6 K + 8 operations,  and  the  inverse  FFT  in 
2K  log2  K — 5 K + 9 operations  [58].  The  transformed  vector  is  stored  in  a compact 
form  that  takes  advantage  of  the  symmetry  of  the  Fourier  transform  of  a real  sequence, 
and  this  also  reduces  the  number  of  operations  required  to  perform  the  point-by- 
point  multiplication  of  the  transformed  vectors  to  MNE/2.  Unfortunately,  the  least 
efficient  case  occurs  when  M = 2J  + 1,  exactly  the  size  of  the  TM  required  for  the 
superfast  eigendecomposition  techniques.  In  this  case,  2 M — 1 = 2J+1  -I-  1,  and 
K = 2J+2  ~ 4 M.  Despite  this,  the  computation  of  D still  requires  less  than  ( NE  + 
1)(8M  log2  M - 8 M + 8)  + MNe/2  + NE(8M  log2  M - 4 M + 9)  operations,  or  about 
(16iV£  + 8)Mlog2  M — 11.5MiV£.  Since  these  algorithms  are  intended  for  use  when 
M » Ne,  the  reduction  from  0(M2NE)  to  0(MNE  log  M)  is  very  significant. 

The  next  step  to  be  considered  is  the  computation  of  EL.  In  Theorem  7.1, 
the  only  restriction  placed  on  L was  that  its  eigenvalues  be  known.  Since  eigenvalue 
estimates  are  available  from  the  eigenvector  computation,  one  possibility  is  to  choose 
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a L that  has  these  estimates  as  its  eigenvalues.  The  simplest  choice  is 

Ls  = diag(A1, . . . , \N), 

LN  = diag(AAr+1, . . . , \M), 


where  the  A are  those  computed  along  with  the  eigenvectors.  In  this  case  only  MNE 
operations  are  needed  to  compute  the  product  EL. 

This  approach  has  the  advantage  of  efficiency,  but  by  choosing  L to  have 
a special  form,  we  can  in  certain  cases  produce  a tighter  bound  on  the  error  in  the 
subspaces,  with  little  additional  computation.  This  is  the  consequence  of  a companion 
to  the  Sin  © theorem,  the  Tan  0 theorem  [40]. 

Theorem  7.2  (Tan  © Theorem)  If  R and  R,  their  eigendecompositions,  their 
invariant  subspaces,  and  the  residuals  Ds  and  DN  are  as  defined  in  Theorem  7.1, 
and  if,  in  addition, 

^ S = ^sR^S’ 


and 


i>N  — E^-REjy, 


then 


tan 0(<S, <S)||  < 


ID, 


A 


N 


'N+l 


and 


||  tan0(jV,  AO 1 1 < 


^ N+l 


for  any  unitarily  invariant  norm. 
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The  penalty  paid  for  this  tighter  bound  is  the  cost  of  computing  the  generalized 
Rayleigh  quotient  ErRE,  and  determining  its  eigenvalues.  Computation  of  ErRE 
merely  requires  multiplying  ET  by  the  previously  computed  RE,  at  a cost  of  2MN% 
operations.  It  is  then  necessary  to  determine  the  largest  or  smallest  eigenvalue  of  L 
(depending  on  whether  E is  a signal  or  noise  subspace),  since  these  are  the  A that  must 
be  used  in  the  denominator  of  the  bounds.  The  matrix  L is  NE  x NE,  and  if  NE  is 
small,  it  may  be  practical  to  compute  the  eigenvalues  by  standard  methods.  Although 
this  requires  4iV|/3  operations,  if  NE  < M the  cost  may  still  be  acceptable.  For 
example,  if  NE  = 20,  the  cost  is  approximately  10,000  operations,  which  is  negligible 
compared  to  the  cost  of  computing  a single  eigenvector  when  M > 500. 

In  cases  where  this  computation  is  impractical,  it  is  still  possible  to  employ 
the  Tan  0 theorem  if  L is  nearly  diagonal.  By  using  the  Gershgorin  circle  theorem, 
it  is  simple  to  produce  bounds  for  the  largest  and  smallest  eigenvalues  of  L.  If  E is  a 
good  approximation  to  the  true  invariant  subspace  E,  then  the  generalized  Rayleigh 
quotient  L will  be  very  close  to  diagonal,  so  that  the  Gershgorin  bounds  will  be  very 
close  to  the  actual  extremal  eigenvalues  of  L. 

Computing  the  Norm  of  the  Residual 

The  Sin  0 and  related  theorems  require  that  ||  sin  © ||  and  ||D||  be  computed 
using  a unitarily  invariant  norm.  The  choice  of  norm  affects  both  our  interpretation  of 
the  bound  on  ||  sin  ©||,  and  the  amount  of  computation  required  to  compute  ||D||.  The 
interpretation  of  the  effects  of  inaccuracy  of  subspace  estimates  is  simplest  when  the 
2-norm  is  used,  so  this  is  the  norm  we  will  employ.  This  means  that  the  calculations 
will  yield  a bound  for  ||  sin  ©||2,  and  that  we  must  either  explicitly  calculate,  or  find 
an  upper  bound  for,  ||D ||2. 

Any  unitarily  invariant  norm  may  be  computed  from  the  singular  values  of  D, 
but  since  computing  the  singular  values  requires  0(MNE  + NE ) operations,  we  will 
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also  need  a more  efficient  method  for  obtaining  a unitarily  invariant  norm  of  D when 
Ne  is  large.  We  will  first  consider  the  case  where  NE  is  small,  so  that  it  is  practical 
to  compute  the  singular  values  explicitly. 

The  two  standard  techniques  for  computing  the  singular  values  of  a matrix  are 
due  to  Golub  and  Reinsch  [37]  and  Chan  [89].  The  numbers  of  operations  required 
for  the  two  techniques  are  approximately  4 MNE  - 4iV|/3  for  the  Golub- Reinsch 
approach,  and  2MNE  + 2NE  for  the  Chan  algorithm,  so  that  Chan’s  approach,  also 
referred  to  as  the  R-SVD,  is  preferred  when  M > 5NE/3.  Since  this  is  by  far  the 
most  common  situation  in  subspace  estimation,  the  R-SVD  is  the  preferred  algorithm. 
Using  the  results  of  this  computation,  we  can  compute  any  unitarily  invariant  norm 
of  D,  including  the  2-norm,  which  is  equal  to  the  largest  singular  value. 

Explicit  computation  of  all  the  singular  values  of  D gives  ||D||2  exactly,  pro- 
ducing the  tightest  bounds  on  the  error  in  the  subspace,  but  it  is  clear  from  the 
presence  of  the  NE  term  in  the  operations  count  that  this  approach  will  become  im- 
practical if  Ne  is  large.  In  these  cases,  one  alternative  is  to  compute  the  Frobenius 
norm  ||D||F,  which  is  the  square  root  of  the  sum  of  the  squares  of  the  elements  of  D. 
Because  the  Frobenius  norm  is  unitarily  invariant,  it  is  also  equal  to  the  square  root 
of  the  sum  of  the  squares  of  the  singular  values  of  D,  and  therefore 

l|D|l2  < IIdIIf  < //VeI|d||2, 

which  sets  an  upper  limit  to  the  degradation  of  the  subspace  bound  due  to  the  use 
of  the  Frobenius  norm.  Computation  of  ||D||F  requires  2 MNE  operations,  and  so  is 
much  more  efficient  than  computing  the  singular  values.  Similar  bounds  that  require 
even  less  computation  are  provided  by  the  relations 

Ne 

||D||2  < Vm IIDII^  = y/M  max  £|-D(i,j)|, 
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and 

M 

for  norms  that  require  only  MNE  floating-point  operations  to  compute. 

Bounding  the  Gap  Between  Subspaces 

The  final  element  in  the  computation  of  the  bounds  is  the  determination  of 
the  gap  between  the  eigenvalues  associated  with  the  computed  subspace  and  those 
associated  with  the  exact  complementary  subspace;  that  is,  XN  — \N+1  or  XN  — XN+V 
In  Chapter  4,  the  locations  of  the  eigenvalues  of  the  exact  matrix  were  not  known, 
and  it  was  necessary  to  use  the  weaker  Sin  20  theorem.  When  testing  the  accuracy 
of  computed  results,  we  can  use  the  brackets  on  the  eigenvalues  of  R to  bound  the 
gap: 


^ IIDjII  < ^ IlDsIl 

~ ^N+l  ^ N ~ UN+1 

IlDyll  < IIDyll 

~ ^N+ 1 ~ ^N+ 1 

where  lN  and  uN+l  are  the  lower  bound  for  XN  and  the  upper  bound  for  A^+1  com- 
puted during  the  bracketing  phase. 

These  bounds  rely  on  the  accuracy  of  the  bracketing  interval;  they  might  be 
in  error  if  a single  value  uN+1  or  lN  is  incorrect.  It  is  possible  to  eliminate  even  this 
small  possibility  of  error  by  computing  one  extra  eigenvalue:  either  the  largest  of  the 
noise  eigenvalues  A^+1  or  the  smallest  of  the  signal  eigenvalues  XN,  depending  on  the 
technique  being  used.  By  finding  an  error  bound  for  the  computed  value  A^j  or  XN, 
we  can  determine  the  gap  much  more  accurately,  making  the  bounds  on  the  subspace 
error  tighter  and  assuring  that  no  single  error  could  cause  the  bound  computation  to 
fail. 
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The  same  technique  may  be  used  to  assess  the  accuracy  of  any  of  the  com- 
puted eigenvalues,  although  we  are  normally  not  as  concerned  with  the  accuracy  of 
eigenvalues  because  most  subspace  techniques  use  them  only  to  identify  the  desired 
invariant  subspace.  A residual  bound  for  individual  eigenvalues  and  eigenvectors  is 
provided  by  the  following  theorem  [41,  p.  258]. 

Theorem  7.3  Let  R be  a real  symmetric  matrix,  and  (A,e)  an  approximation  to  an 
eigenvalue  of  R and  the  associated  eigenvector.  If  the  residual  vector  d is  defined  as 

d = Re  — Ae, 

then  there  is  an  eigenvalue  of  R,  with  associated  eigenvector  eit  for  which 

|Aj  - A|  < min  j||d||2 

and 

■ / IldlL 

sinZ(ei,e)  < — 

where 

S = min  |A  - — A|. 

j^i  J 

If  the  smaller  of  the  intervals  (A  - ||d||2,  A + ||d||2),  (A  - ||d||2/<5,  A + ||d||2/<5) 
does  not  overlap  two  bracket  intervals,  then  we  can  be  certain  which  eigenvalue  A-  is 
the  closest  to  A.  In  practice,  this  bound  is  almost  always  much  smaller  than  — l{, 
so  it  generally  produces  a tighter  bound  on  the  gap,  and  therefore  on  the  subspace 


error. 
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Comparing  a Toeplitz  Estimate  with  the  Covariance  Estimate 

The  sections  above  have  been  concerned  with  verifying  that  the  eigenvectors 
and  eigenvalues  of  a Toeplitz  matrix  have  been  computed  accurately.  We  now  turn 
to  another  question  that  may  be  answered  using  a residual  bound:  how  close  are  the 
invariant  subspaces  of  a Toeplitz  estimate  of  the  autocorrelation  matrix  to  those  of 
the  covariance  estimate?  In  Chapter  4,  we  discussed  the  use  of  a Toeplitz  estimate 
for  the  autocorrelation  matrix  in  place  of  the  covariance  estimate  normally  used  in 

the  subspace  methods.  There,  it  was  shown  that  a Toeplitz  estimate  could  produce 

* 

reasonably  accurate  results,  and,  in  the  following  chapters,  it  has  been  shown  that 
the  computations  may  be  performed  much  more  rapidly  if  a Toeplitz  estimate  is  used. 

By  using  a residual  bound,  it  is  possible  to  determine  how  close  the  computed 
eigenvalues  and  invariant  subspaces  of  a Toeplitz  estimate  Rj,  are  to  those  of  the 
covariance  estimate  Rc,  by  computing 

D = RCE  - EL, 


using  the  E and  L computed  from  the  Toeplitz  matrix  R^,.  Because  Rc  = YTY 
(equation  (4.6)),  where  Y is  the  Toeplitz  data  matrix  defined  by  equation  (4.5),  the 
matrix  products  may  again  be  computed  using  the  fast  Fourier  transform.  However, 
Y is  L — M + 1 x M,  so  the  FFTs  required  are  longer;  the  required  length  is  at 
least  2 L — 2 M + 1.  Because  two  matrix- vector  multiplies  are  necessary,  the  complete 
computation  requires  4 NE  + 1 forward  or  inverse  FFTs  of  this  length.  Because  the 
eigenvalues  of  Rc  are  not  known,  it  is  also  necessary  to  use  the  residual  form  of  the 
Sin  2©  theorem: 

Theorem  7.4  (Sin  2©  Theorem)  If  R and  R,  their  eigendecompositions,  their 
invariant  subspaces,  and  the  residuals  D5  and  DN  are  as  defined  in  Theorem  7.1, 
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then 

||  sin  20(5, 5)||  < .2|jP»11  , 

*N  *N+1 

and 

||sin20(V,AO||<  - 2||Df11  , 

*N  *N+ 1 

for  any  unitarily  invariant  norm. 

The  resulting  bound  on  the  angle  between  the  two  subspaces  makes  it  possible 
to  assess  not  only  the  accuracy  of  the  Toeplitz  eigendecomposition,  but  also  the  effect 
of  using  a Toeplitz  autocorrelation  estimate.  Numerical  experiments  indicate  that 
the  bound  on  the  largest  angle  between  the  subspaces  obtained  by  this  technique  is 
relatively  tight,  almost  always  within  a factor  of  five  of  the  exact  maximum  angle 
©j.  Because  of  this,  testing  the  accuracy  in  this  way  should  lead  to  an  accurate 
assessment  of  the  effects  of  using  a Toeplitz  autocorrelation  estimate. 

Frequency  Estimation  Accuracy 

The  tests  above  produce  bounds  on  the  error  in  the  estimated  subspace  E.  It 
is  natural  to  ask  how  the  error  in  a subspace  is  related  to  the  error  in  a frequency 
estimate  derived  from  that  subspace.  For  estimators  such  as  MUSIC  that  compute 
frequency  estimates  as  the  minima  of  a projection  onto  a subspace,  there  is  a partic- 
ularly useful  relation  between  subspace  accuracy  and  the  minimization  used  to  find 
the  frequency  estimate.  This  relation  is  based  on  the  following  theorem  [41]. 

Theorem  7.5  Let  the  matrices  E and  E both  have  Nn  orthonormal  columns.  The 
orthogonal  projectors  onto  the  subspaces  E and  E spanned  by  their  columns  are  P = 
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EEJ  and  P = EEr,  respectively.  Then 


IIP  “ P|l2  = I|sin0(£,£)||2  = sin ©^5, 5). 

Recall  from  Chapter  3 that  the  MUSIC  algorithm  computes  frequency  esti- 
mates by  locating  the  minima  on  the  unit  circle  of  the  polynomial 


p(z)  = zHENETNz, 


where  EN  is  the  matrix  whose  Nn  columns  are  the  eigenvectors  of  the  noise  subspace, 
and  z = [1,  z,  z2, . . . , zM-1]T.  It  is  simple  to  show  that  the  difference  between  the 
result  obtained  for  the  exact  noise  subspace  spanned  by  E^  and  the  approximate 
noise  subspace  spanned  by  EN  is  given  by 


zhEnEtnz-zhEnEtnz\  = 

< 


I|z"e„e£z  - z*E*ISz||2 
||z||2||EnE^-EnE^||2 
NN\\sm&(M,Af)\\2 
A^iVsin01(A/r,A/r). 


This  limits  the  difference  between  the  results  obtained  with  an  “exact”  set  of 
eigenvectors  and  an  approximate  set,  and  can  be  used  to  bound  the  accuracy  of  the 
computed  frequency  estimate.  Suppose  that  the  result  of  a computation  using  the 
approximate  matrix  EN  is  a minimum  p(u)  = pmin  at  the  frequency  u.  If  it  is  known 
that  sin  < C,  then  the  minimum  computed  using  the  exact  matrix  E^  must  lie 
between  the  points  on  either  side  of  d>  at  which  p{uj)  = pmin  + NnC.  If  the  bound  is 
obtained  for  the  difference  between  the  exact  and  computed  subspaces  of  a Toeplitz 
matrix,  it  gives  the  maximum  difference  in  the  frequency  estimate  due  to  numerical 
error  in  the  computation  of  EN.  If  the  bound  is  obtained  for  the  difference  between 
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the  computed  Toeplitz  subspace  and  the  subspace  of  the  covariance  estimate,  it  gives 
the  maximum  difference  due  not  only  to  numerical  error,  but  also  to  the  use  of  a 
Toeplitz  estimate  of  R. 

Verifying  Generalized  Eigendecompositions 

Extensions  of  the  Sin  © and  Sin2©  theorems  to  the  generalized  eigenvalue 
problem  have  been  developed  by  Sun  [90,91].  Unfortunately,  these  extensions  include 
only  perturbation  forms  of  these  theorems,  not  the  residual  forms  required  for  effi- 
ciently verifying  the  accuracy  of  generalized  eigendecompositions.  One  bound  that 
still  holds  is  a generalized  form  of  Theorem  7.3:  for  any  approximate  generalized 
eigenvalue  and  eigenvalue  (A,e),  there  is  an  eigenvalue  Ai  such  that 

„ ||(R-AS)e||s_, 

'Ai ' A|  5 l|Se||E_,  ' 

where  the  norm  ||x||E_i  = (x^E^x)1/2.  Computing  this  bound  is  made  more  difficult 
by  this  more  complex  norm;  although  the  denominator  is  equal  to  eTSe,  to  compute 
the  numerator,  it  is  necessary  to  compute  the  inverse  of  E and  employ  the  Gohberg- 
Semencul  relation. 

Although  the  lack  of  a residual  bound  for  generalized  invariant  subspaces  makes 
verification  of  the  accuracy  of  a computed  subspace  difficult,  in  practice,  the  com- 
puted generalized  eigenvectors  have  been  found  to  be  of  high  quality  as  long  as  E is 
well  conditioned,  in  cases  where  it  is  necessary  to  be  assured  of  the  accuracy  of  gener- 
alized eigendecompositions,  it  is  possible  to  employ  the  0(M2)  look-ahead  Levinson 
algorithm  described  in  the  previous  chapter  as  the  basis  for  the  computation,  ensuring 
that  the  results  remain  uncorrupted  by  instability. 


CHAPTER  8 

ESTIMATING  THE  NUMBER  OF  SIGNALS 

To  use  the  fast  Toeplitz  algorithms  in  subspace  frequency  estimation,  it  is 
necessary  to  know  how  many  eigenvalues  and  eigenvectors  must  be  computed;  this 
is  equivalent  to  knowing  the  number  of  signals  present  in  the  input  data.  In  some 
cases,  the  number  is  known  a priori,  so  that  the  fast  algorithms  may  be  used  exactly 
as  described  in  Chapter  5.  In  other  cases,  however,  it  is  necessary  to  estimate  the 
number  of  signals;  this  generally  requires  some  additional  computation  beyond  that 
needed  when  the  number  of  signals  is  known. 

As  discussed  Chapter  3,  the  most  widely  used  estimators  for  the  number  of 
signals,  the  AIC  (equations  (3.10),  (3.12))  and  the  MDL  (equations  (3.11),  (3.13)), 
are  of  the  form 

h( A,  n,  M,  L ) = f(n,  M,  L)  log(a(A,  n,  M,  L ))  + p(n , M,  L ), 

where  M is  the  dimension  of  R,  A is  the  set  of  eigenvalues  of  R,  n is  the  hypothesized 
number  of  signals,  and  L is  the  number  of  points  in  the  data  record.  The  estimated 
number  of  signals  N is  the  value  of  n that  minimizes  h(A,n,  M,  L).  The  function  a 
in  these  estimators  is  the  ratio  of  the  geometric  and  arithmetic  means  of  the  M — n 
smallest  eigenvalues  of  R (which  must  all  be  positive  for  this  technique  to  be  applied), 
and  p is  a penalty  function  that  measures  the  complexity  of  the  model.  The  function 
a is  a measure  of  the  inequality  of  the  smallest  eigenvalues;  if  all  of  the  eigenvalues 
are  equal,  a — 1,  and  since  all  of  the  eigenvalues  are  positive,  a > 0.  Because  all 
of  the  noise  subspace  eigenvalues  of  the  exact  autocorrelation  matrix  are  equal  (for 
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white  noise),  the  nearness  of  a to  one  is  related  to  the  likelihood  that  the  dimension  of 
the  noise  subspace  is  M — n.  The  only  part  of  h that  depends  on  the  autocorrelation 
estimate  R is 


M 


1 /(M-n) 


a(A,  n,  M,  L) 


n \ 

q=n+l 


1 

M — n ■ 


M 

E K 

t=n+l 


When  conventional  methods  are  used  to  compute  the  eigenvalues,  all  of  the 
are  computed  simultaneously,  so  that  the  value  of  h may  then  be  computed  for  all 
n.  Rather  than  computing  each  eigenvalue  of  R,  fast  Toeplitz  eigendecomposition 
algorithms  compute  exact  values  for  a subset  of  the  Ai;  and  find  bracket  intervals 
that  contain  each  of  the  remaining  eigenvalues.  These  bracket  intervals  can  be  used 
to  estimate  the  number  of  signals  using  conventional  techniques  such  as  the  AIC  and 


MDL. 


Bounding  the  Number  of  Signals  Estimate 

Instead  of  calculating  h( A,  n,  M,  L)  exactly  for  each  n,  it  is  possible  to  use  the 
eigenvalue  brackets  to  find  upper  and  lower  bounds  on  the  value  of  h( A,  n,  M,  L).  If, 
for  a certain  value  of  n,  say  n',  it  can  be  shown  that  the  upper  bound  on  h{ A,  n',  M,  L) 
is  less  than  the  lower  bound  for  all  the  other  values  of  n,  then  N = v! . In  this  way, 
the  same  estimated  number  of  signals  N can  be  found,  without  explicitly  computing 
all  of  the  eigenvalues  of  R.  To  determine  N,  then,  we  will  need  to  compute  upper 
and  lower  bounds  on  h for  each  n. 

For  any  given  n,  the  only  quantity  that  is  not  known  exactly  is  a,  and  because 
the  logarithm  is  a strictly  increasing  function  of  its  argument,  it  is  easy  to  see  that 
the  minimum  and  maximum  values  of  h will  occur  when  a is  at  its  upper  or  lower 
bound.  Since  /(n,  M,  L)  < 0 for  both  the  AIC  and  the  MDL,  the  maximum  value  of 
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h{ A,  n,  M,  L ) for  a given  n will  occur  when  a(A,  n,  M)  is  minimized,  and  the  minimum 
when  a is  maximized.  The  problem  has  thus  been  reduced  to  determining  bounds 
for  a for  each  n.  We  will  consider  the  problem  is  two  parts:  first,  the  computation  of 
the  upper  and  lower  bounds  for  a given  value  of  n,  and  second,  efficient  methods  for 
computing  bounds  for  n + 1 using  information  gained  while  computing  the  bounds 
for  n. 

Computing  Bounds  for  a Single  Value  of  n 

The  problem  of  minimizing  h for  a single  value  of  n is  a bound-constrained 
nonlinear  optimization  problem.  Although  the  exact  solution  of  this  problem  is  gen- 
erally too  computationally  expensive  to  be  practical,  it  is  possible  to  compute  partial 
solutions  to  this  problem  that  yield  bounds  for  h. 

The  set  of  upper  bounds  for  each  of  the  A will  be  denoted  by  the  vector  u,  and 
the  lower  bounds  by  the  vector  1,  so  that 


u. 


> A-  > l: 

~ J ~ 1 


As  usual,  the  A are  ordered,  so  that  \ > A2  > . . . > XM,  and  the  upper  and  lower 
bounds  u and  1 are  ordered  in  the  same  way.  To  be  certain  that  log(o:)  is  real  valued, 
it  is  also  necessary  to  require  that  lM  > 0,  which  corresponds  to  requiring  that  R be 
positive  definite. 

To  simplify  the  notation,  we  will  denote  the  geometric  mean  of  the  last  M — n 
eigenvalues  as 

( M \ 

G(\,n,M)=  n M 

\t=n+l  / 
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and  their  arithmetic  mean  as 


A(X,n,M ) 


i M 

Z K 


M — n ... 


i=n+l 


Using  this  notation  a is 


a( A,  n,  M)  = 


G(A,n,  M) 
A(A,n,M)’ 


and  the  derivative  of  a with  respect  to  one  of  the  eigenvalues  A ■ is 


a(A,  n,  M)  = — — - — a(A,  n,  M)  ( -3 7—^ 

dXj  M-n  y\.  A(X,n,M)i 


(8.1) 


For  a fixed  n,  the  problem  of  minimizing  or  maximizing  a exactly  requires  find- 
ing a vector  A,  each  of  whose  elements  lie  within  the  bracket  interval,  that  maximizes 
or  minimizes  a(A,  n,  M);  the  vectors  that  maximize  and  minimize  a will  be  denoted 
A+(n)  and  A~(n),  respectively.  Some  of  the  relations  developed  below  apply  to  both 
the  maximizing  and  minimizing  vectors;  for  brevity,  we  will  use  the  notation  X±  to 
indicate  that  an  expression  applies  to  either  vector.  (Of  course,  the  substitution  of 
A+  or  A-  for  X±  in  these  expressions  must  be  consistent.) 

For  each  element  A^(n)  of  either  the  maximizing  vector  A+(n)  or  the  minimiz- 
ing vector  A-(n),  one  of  the  following  three  mutually  exclusive  conditions  is  true  [92,  p. 
77]: 


A: 

B: 


Xf(n)  = Uj- 

Xf(n)  = lj- 
d 


C:  Uj  > Xf(n)  > L and  — — a(A  , n,  M)  = 0. 


If  we  examine  equation  (8.1),  we  see  that,  because  a/(M  — n)  must  be  positive, 
the  sign  of  the  derivative  with  respect  to  X-  is  determined  by  the  relative  size  of 
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A f and  A^^n,  M).  The  derivative  will  be  positive  if  A f < ^(A±,n,  M),  zero  if 
A f = A^^n,  M),  and  negative  if  A f > A^^n,  M).  The  value  of  A(X±,n,M)  is 
not  known  exactly,  but  it  is  bounded  by  A(u,  n,  M ) and  .4(1,  n,  M),  and  this  may 
enable  us  to  determine  some  of  the  elements  of  A+  and  A-. 

If  the  bracket  intervals  for  an  eigenvalue  u ■ > A*  > l-  and  those  for  the 
arithmetic  mean  4(u,  n,M)  > A(X±,n,M ) > 4(1,  n,  M)  do  not  overlap,  then  |2- 
cannot  be  zero  for  any  A • within  the  bracket,  and  it  is  possible  to  determine  the 
values  of  A J and  Xf  easily.  If  the  partial  derivative  of  a with  respect  to  A ■ is  positive 
for  all  A j within  the  bracket  interval,  then  A j = 1-,  and  Xf  = uy,  if  the  derivative  is 
negative  over  the  entire  interval,  then  A j = Uj,  and  A+  = l,-. 

As  each  element  of  A+  and  A~  is  determined  in  this  fashion,  the  bracket  inter- 
vals for  A(A+,n,  M)  and  A(A_,n,  M)  are  narrowed,  possibly  allowing  other  elements 
of  the  minimizing  and  maximizing  vectors  to  be  determined.  Since  the  global  maxi- 
mum of  a is  one,  and  occurs  when  all  of  the  A are  equal,  we  can  see  that  maximizing  a 
requires  making  the  A as  close  to  equal  as  possible  (that  is,  as  close  to  the  arithmetic 
mean  as  possible),  and  minimizing  it  requires  that  the  A be  as  unequal  (as  far  from 
the  arithmetic  mean)  as  possible.  The  iterative  determination  of  components  of  A+ 
and  A~,  and  computation  of  an  upper  bound  a+(n)  on  a(A,n,  M)  is  performed  by 
Algorithm  8.1  below.  The  computation  of  a lower  bound  is  performed  in  exactly  the 
same  fashion. 
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Algorithm  8.1:  Bounding  a 

start  = M — n + 1 
finish  = M 
changed  = true 
while  changed 

50  = sum (l(M  — n + 1 : start  — 1)) 

51  = sum (l(start  : finish)) 

52  — sum(min(u(start  : finish),  l (start  — 1)) 

53  = sum (u(finish  + 1 : M)) 

A+  = (So  + S2  + S3)/(M  - n) 

A-  = (S0  + S!  + S3)/(M  - n) 
changed  = false 

for  i = start  : finish 
if  k > A+ 
start  = i + 1 
changed  = true 
else  if  Ui  < A~ 
finish  = i — l 
changed  = true 
end 
end 
end 

G+  — (prod (l(M  — n + 1 : start  — 1))  * prod(min(u(start  : M),  l(start  — l)))1/^-") 
a+(n)  = min(l,  G+/A~) 

After  determining  as  many  of  the  components  of  A+  as  possible,  Algorithm  8.1 
computes  the  maximum  value  of  a by  taking  the  ratio  of  the  largest  possible  geometric 
mean  to  the  smallest  possible  arithmetic  mean  (or  the  reverse,  when  the  minimum  of 
a is  being  calculated).  This  is  a suboptimal  choice,  since  the  values  assumed  for  the 
unknown  eigenvalues  are  different  in  the  computation  of  the  two  means.  Finding  a 
consistent  maximizing  or  minimizing  vector  requires  much  more  computation,  but  if 
the  number  of  remaining  eigenvalues  is  small  enough,  it  may  be  feasible  to  solve  the 
exact  bound-constrained  minimization  or  maximization  problem.  This  approach  is 
only  viable  when  the  number  of  elements  whose  values  are  not  known  is  small  (5-20). 
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Narrowing  the  Bracket  Intervals 

When  the  bracket  intervals  resulting  from  the  computation  of  a small  number 
of  eigenvalues  are  used  in  the  above  procedure,  the  resulting  bounds  on  h are  often 
too  loose  to  allow  the  number  of  signals  estimate  to  be  determined.  This  problem 
is  particularly  common  when  a signal  subspace  frequency  estimator  is  being  used, 
because  these  estimators  compute  the  largest  eigenvalues  of  R,  first,  while  the  number 
of  signals  estimators  employ  the  smallest  eigenvalues.  It  is  of  course  possible  to 
continue  to  compute  eigenvalues  until  a large  number  are  known  exactly,  and  the 
bounds  on  those  remaining  are  narrow  enough  that  the  number  of  signals  estimate 
may  be  determined.  Generally,  more  than  half  of  the  eigenvalues  must  be  computed 
before  this  occurs,  and  the  cost  per  eigenvalue  is  approximately  8 solutions  of  the 
Yule- Walker  equation. 

In  many  cases,  the  number  of  signals  may  be  estimated  with  many  fewer 
solutions  by  employing  the  spectrum  slicing  technique  described  in  Chapter  6 at 
selected  values  of  A.  Assume  that  the  eigenvalues  that  have  not  yet  been  computed 
exactly  lie  in  the  range  A^  > Ai  > \L.  By  slicing  the  spectrum  within  this  range  at 
intervals  of  6,  the  bracket  interval  for  every  unknown  eigenvalue  can  be  reduced  to 
no  more  than  S.  The  cost  for  this  is  (\v  — A L)/S  slices.  In  contrast  to  computing 
additional  eigenvalues,  each  slice  costs  only  one  Yule- Walker  solution,  and  the  new 
information  obtained  is  spread  across  the  entire  region,  rather  than  being  concentrated 
near  a certain  group  of  eigenvalues. 

When  this  technique  is  employed,  it  is  often  possible  to  determine  the  number 
of  signals  estimate  with  20-50  additional  Yule-Walker  solution,  equivalent  to  the 
computation  of  3-6  additional  eigenvalues.  In  addition,  it  is  possible  to  subdivide  the 
interval  coarsely,  check  whether  the  number  of  signals  estimate  can  be  determined, 
and  divide  the  interval  more  finely  only  if  necessary. 
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Computing  Bounds  for  Consecutive  Values  of  n 

Once  a set  of  brackets  for  A +(n)  and  A ~(n)  has  been  determined,  it  is  possible 
to  bracket  A+(n  + 1)  and  A~(n+  1)  with  less  computation.  The  bracket  computation 
for  n produces  a pair  of  indices  start  and  finish  that  indicate  the  region  within  which 
the  elements  of  the  minimizing  or  maximizing  vector  are  not  exactly  known.  This 
region  is  occupied  by  those  components  of  A*  that  are  within  the  range  of  possible 
values  for  the  mean  A(X±,n,  M). 

Because  the  A are  in  decreasing  order,  A(A±,n  + 1 , M)  < A{\± ,n,  M)\  this 
means  that  the  values  of  start  and  finish  cannot  increase.  The  additional  computa- 
tion required  for  n + 1 is  simply  the  examination  of  the  vectors  near  these  boundaries; 
if  the  range  of  possible  values  for  the  arithmetic  mean  has  moved  downward  so  that 
it  no  longer  overlaps  an  eigenvalue  bracket,  then  start  has  decreased,  and  the  value 
of  some  elements  of  the  minimizing  or  maximizing  vector  may  now  be  found  exactly. 
Similarly,  if  the  range  of  values  for  the  arithmetic  mean  has  moved  downward  to 
overlap  a bracket  interval,  than  that  element’s  value  can  no  longer  be  determined 
exactly. 

Although  the  principal  focus  of  this  work  is  on  the  subspace  frequency  estima- 
tors themselves,  rather  than  the  number  of  signals  estimation  problem,  this  chapter 
has  outlined  a simple  technique  for  obtaining  estimates  of  the  number  of  signals  when 
the  fast  Toeplitz  eigensolvers  are  used.  Unfortunately,  unlike  the  standard  implemen- 
tations of  the  subspace  techniques,  the  fast  Toeplitz  implementations  usually  require 
additional  computation  if  the  number  of  signals  is  not  known  a priori.  The  possibility 
of  more  elegant  solution  to  this  problem  is  a subject  for  future  investigation. 


CHAPTER  9 

COMPARISON  OF  CONVENTIONAL  AND  FAST  METHODS 


In  this  chapter,  the  fast  Toeplitz  eigendecomposition  methods  are  used  in  an 
implementation  of  the  ESPRIT  method  for  frequency  estimation.  If  a Toeplitz  auto- 
correlation estimate  is  used,  the  fast  implementation  provides  a much  more  efficient 
method  of  computing  subspace  frequency  estimates.  As  we  saw  in  Chapter  4,  when 
a Toeplitz  matrix  is  used  in  place  of  a covariance  matrix  of  the  same  size,  some  per- 
formance degradation  occurs.  Because  of  the  superior  efficiency  of  the  fast  Toeplitz 
algorithms,  however,  it  is  often  possible  to  use  a larger  matrix  and  still  reduce  the 
computation  time.  We  will  see  below  that  in  many  cases,  the  improvement  in  the 
accuracy  of  the  frequency  estimate  due  to  the  use  of  a larger  R more  than  offsets  the 
degradation  due  to  the  use  of  a Toeplitz  estimate.  The  result  is  that  more  accurate 
frequency  estimates  may  be  obtained  in  less  time,  when  compared  to  conventional 
implementations  of  the  subspace  methods. 

The  analysis  in  Chapter  4 also  showed  that  the  relative  performance  of  the 
various  estimators  depended  on  several  characteristics  of  the  input  data,  including 
the  number  of  samples  in  the  data  record,  the  signal-to-noise  ratio,  and  the  frequency 
difference  between  the  sinusoids  in  the  input  data.  To  provide  a clear  illustration  of 
the  effects  of  these  changes,  we  will  consider  cases  in  which  the  effect  of  each  of  these 
factors  is  clearly  seen,  examining  the  variations  in  the  accuracy  and  execution  time  of 
the  conventional  and  fast  methods  for  subspace  frequency  estimation.  First,  however, 
a detailed  description  of  the  implementation  of  each  of  these  approaches  will  be  given. 
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Implementation  of  the  ESPRIT  Algorithm 

Two  different  forms  of  the  TLS-ESPRIT  algorithm  described  in  Chapter  3 
have  been  implemented.  The  first  uses  the  most  efficient  conventional  techniques  to 
compute  the  eigendecomposition,  and  may  be  used  with  either  Toeplitz  or  covariance 
estimators  of  the  autocorrelation  matrix;  the  second  uses  the  fast  Toeplitz  eigensolver 
described  in  Chapter  5.  Other  than  the  method  of  computing  the  invariant  subspace, 
the  two  variants  are  identical. 

Eigendecomposition  by  the  Conventional  Method 

The  conventional  variant  calculates  the  eigenvalues  and  eigenvectors  by  stan- 
dard methods,  using  routines  from  the  linear  algebra  packages  LINPACK  [93]  and 
EISPACK  [94,95].  These  methods  make  no  assumption  about  the  structure  of  the 
matrix,  except  for  symmetry;  they  may  be  used  with  any  of  the  autocorrelation  matrix 
estimators  discussed  here. 

In  cases  where  only  a few  eigenvectors  are  required,  R is  reduced  to  tridiagonal 
form  using  the  EISPACK  routine  tredl,  and  its  eigenvalues  are  calculated  using  QL 
iteration  by  the  routine  tqll.  Reduction  of  an  M x M matrix  to  tridiagonal  form  re- 
quires approximately  4M3/3  operations;  the  cost  of  computing  the  eigenvalues  of  the 
resulting  tridiagonal  matrix  is  negligible  by  comparison.  Once  the  eigenvalues  have 
been  found,  the  eigenvectors  are  calculated  by  inverse  iteration  using  the  LINPACK 
routines  dsifa  and  dsisl,  which  solve  symmetric  indefinite  systems.  Each  itera- 
tion requires  M3/ 3 operations,  and  since  the  eigenvalues  are  very  accurate,  only  one 
iteration  is  usually  needed  to  produce  an  accurate  eigenvector.  The  total  cost  to  pro- 
duce N eigenvalues  and  their  associated  eigenvectors  is  approximately  (N  + 4)M3/3 
operations. 

If  many  eigenvectors  are  required,  the  EISPACK  routines  tred2  and  tql2 
are  used  instead.  These  also  perform  Householder  reduction  to  tridiagonal  form  and 
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tridiagonal  QL  iteration,  but  they  accumulate  the  orthogonal  transformations  used 
to  perform  these  steps,  so  that  when  the  process  is  complete,  all  the  eigenvectors  and 
eigenvalues  of  R are  available.  The  total  computational  cost  is  approximately  9 M3 
operations,  so  if  more  than  23  eigenvectors  are  needed,  it  is  more  efficient  to  use  these 
routines  to  compute  them  all,  rather  than  use  inverse  iteration. 

Eigendecomposition  by  the  Fast  Method 

The  fast  variant  computes  eigenvalues  and  eigenvectors  of  a Toeplitz  autocorre- 
lation matrix  estimate  using  the  fast  and  superfast  techniques  described  in  Chapter  5. 
The  crossover  point  between  the  fast  and  superfast  Toeplitz  solvers  was  chosen  to  be 
M = 513,  since  the  superfast  techniques  become  more  efficient  at  this  point.  Thus 
computation  of  N eigenvalues  and  eigenvectors  for  M < 513  requires  0(NM2)  op- 
erations, and  0(NM  log2  M)  operations  are  needed  for  M > 513.  As  described  in 
Chapter  5,  the  accuracy  of  the  computed  eigenvalues  and  eigenvectors  is  controlled 
by  the  tolerance  e,  which  is  the  precision  to  which  a root  of  E( A)  is  located. 

The  number  of  Toeplitz  solutions  necessary  to  locate  an  eigenvalue  depends  on 
the  tolerance  e,  as  well  as  on  the  number  of  nearby  eigenvalues  previously  computed, 
since  information  from  nearby  solutions  provides  the  root  finder  with  a better  starting 
point.  For  the  computations  described  here,  e = 10-12  was  used,  and  the  number  of 
Toeplitz  solutions  required  to  locate  the  first  eigenvalue  was  approximately  10-12. 
Subsequent  nearby  eigenvalues  required  fewer  evaluations;  the  average  number  of 
solutions  required  for  a cluster  of  contiguous  eigenvalues  was  between  8 and  10.  To 
compute  each  solution,  the  fast  and  superfast  Toeplitz  solvers  require  approximately 
2 M2  and  8 M logg  M operations,  respectively. 

Because  the  accuracy  of  ESPRIT  frequency  estimates  is  entirely  determined 
by  the  accuracy  of  the  computed  invariant  subspace,  a single  inverse  iteration  is 
used  to  improve  the  accuracy  of  each  eigenvector  after  the  associated  eigenvalue  is 
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found.  For  M < 64,  Trench’s  algorithm  is  used  to  solve  the  resulting  (indefinite) 
Toeplitz  system;  for  M > 64,  it  is  more  efficient  to  calculate  the  solution  to  the 
Yule- Walker  problem  using  a fast  or  superfast  Toeplitz  solver,  and  use  the  Gohberg- 
Semencul  relation  to  compute  the  solution.  Since  the  inverse  iteration  requires  a 
single  additional  solution  of  the  Yule- Walker  problem,  the  total  cost  for  locating  a 
cluster  of  N eigenvalues  and  their  associated  eigenvectors  is  approximately  20 NM2 
for  M < 513,  and  80NM\ogl  M for  M > 513. 

Although  the  residual  computations  described  in  Chapter  7 were  performed  on 
each  of  the  calculated  eigenvalues  and  eigenvectors,  with  the  threshold  on  the  angle 
between  the  exact  and  computed  eigenvector  set  at  10~3,  no  eigenvalues  were  rejected 
in  any  of  the  trials  performed  here.  In  addition,  empirical  tests  showed  almost  no 
difference  between  the  frequency  estimates  computed  with  fast  techniques  and  those 
computed  using  stable  methods.  If  the  same  Toeplitz  matrix  was  used  to  compute 
frequency  estimates  with  the  conventional  (stable)  methods  and  the  fast  methods,  it 
was  not  unusual  for  the  resulting  frequency  estimates  to  agree  to  15  digits,  and  the 
largest  difference  in  a frequency  estimate  observed  in  several  thousand  trials  was  in 
the  11th  digit.  This  indicates  that  the  fast  techniques  give  very  good  estimates  of  the 
invariant  subspaces,  and  that  the  residual  bounds  need  only  be  used  to  check  for  the 
extremely  rare  cases  where  numerical  instability  causes  large  errors. 

The  ESPRIT  Computation 

Once  the  desired  invariant  subspace  has  been  computed,  the  conventional  and 
fast  implementations  perform  exactly  the  same  computational  steps.  The  TLS  solu- 
tion to  the  ESPRIT  equation  ExX  — E2  is  computed  using  the  partial  TLS  algorithm 
developed  by  Van  Huffel  [96];  this  algorithm  computes  only  the  portion  of  the  singular 
value  decomposition  necessary  to  solve  the  TLS  problem,  and  requires  approximately 
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the  same  number  of  operations  as  the  standard  TLS  solution  method  using  the  R- 
SVD  [37,  p.  577].  This  step  requires  about  2MN 2 operations  to  compute  X. 

The  eigenvalues  of  the  TLS  solution  X are  computed  by  reducing  it  to  upper 
Hessenburg  form  using  the  EISPACK  routine  elmhes,  and  computing  the  eigenvalues 
using  the  routine  hqr.  The  total  computation  required  by  this  step  is  approximately 
10iV3  operations.  The  TLS-ESPRIT  frequency  estimates  are  the  arguments  of  the 
eigenvalues  of  X. 

Relative  Complexity 

By  summing  the  computational  costs  given  in  the  section  above,  we  can  see 
that  the  cost  of  estimating  the  frequencies  of  N complex  signals  using  the  TLS- 
ESPRIT  algorithm  is  approximately  2MN 2 + 10iV3  operations,  plus  the  cost  of  com- 
puting the  desired  invariant  subspace.  Computation  of  the  subspace  requires  approx- 
imately (N  + 4)M3/3  operations  for  the  conventional  methods,  20 NM2  for  the  fast 
methods,  and  80NM  loga  M for  the  superfast  methods. 

The  worst  case  for  the  fast  and  superfast  algorithms  is  when  all  of  the  eigen- 
values are  computed,  that  is,  N = M.  In  this  case,  the  fast  algorithms  are  always 
more  complex  than  conventional  methods.  It  is  interesting  to  note,  however,  that  the 
superfast  techniques  are  asymptotically  less  complex  than  the  conventional  methods 
even  if  all  of  the  eigenvalues  and  eigenvectors  are  computed.  Because  the  conventional 
methods  require  about  9 M3  operations,  the  superfast  methods  are  less  complex  than 
conventional  techniques  for  M > 1025.  In  practice,  the  better  locality  of  reference  of 
the  conventional  techniques  renders  them  slower  until  M « 2049.  Although  subspace 
frequency  estimation  rarely  requires  such  large  numbers  of  eigenvectors,  the  superfast 
methods  are  the  first  Toeplitz  eigendecomposition  algorithms  for  which  the  asymp- 
totic complexity  is  lower  than  the  standard  methods  even  when  all  eigenvalues  and 
eigenvectors  are  computed. 
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Simulation  Results 


In  this  section,  the  results  of  the  fast  and  conventional  variants  are  compared 
for  a variety  of  different  input  signals,  and  the  effects  of  variations  in  data  record 
length,  signal-to-noise  ratio,  and  frequency  spacing  are  examined.  Both  the  accuracy 
of  the  estimates,  in  terms  of  the  mean  squared  error  and  variance,  and  the  efficiency 
of  the  computation,  in  terms  of  the  total  computation  time,  are  compared.  The  cases 
considered  are  summarized  in  Table  9.1  below,  which  also  lists  the  figures  in  which 
the  relevant  results  are  shown. 


Table  9.1:  Summary  of  Test  Cases 


Case 


Results 


Low  signal-to-noise  ratio 
Long  data  record 
High  signal-to-noise  ratio 
Closely  spaced  signals 


Figures  9.1  - 9.3 
Figures  9. 4-9. 5 
Figures  9. 6-9. 9 
Figures  9.10-9.12 


The  accuracy  of  the  techniques  is  compared  in  terms  of  the  statistics  of  the 
frequency  estimates  computed  for  an  ensemble  of  input  signals  with  identical  sinu- 
soidal components,  but  different  pseudorandom  Gaussian  noise.  To  determine  these 
statistics,  each  problem  was  solved  100  times  using  each  of  the  two  variants.  The 
pseudorandom  number  generator  that  produced  the  noise  in  each  input  data  record 
was  reinitialized  to  the  same  seed  value  for  each  variant;  this  ensured  that  the  input 
data  were  identical  for  the  fast  and  conventional  versions. 

Low  Sienal-to-Noise  Ratio 

The  input  for  the  first  case  is  a data  record  with  / = 1000  samples,  composed 
of  two  sinusoids  with  parameters  ax  = 1,  uq  = 1.88496,  <f)1  = 0.3,  a2  = 1,  a>2  = 
2.01062,  4>2  = —0.4,  in  additive  white  Gaussian  noise  with  a2  = 1.0.  This  represents 
a relatively  low  signal-to-noise  ratio  of  —3  dB,  and  is  one  of  the  cases  for  which  we 
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expect  the  fast  Toeplitz  methods  to  have  good  performance  compared  to  the  standard 
variants.  The  mean  squared  error  in  the  frequency  estimate  Cj1  is  shown  in  Figure  9.1. 
As  before,  the  Bhattacharyya  bound  will  be  shown  for  comparison  on  graphs  of  both 
mean  squared  error  and  variance,  even  though  it  only  bounds  the  variance. 

Even  for  this  relatively  small  data  record  length,  the  performance  of  the  fast 
Toeplitz  method  is  competitive  with  the  conventional  technique,  with  an  increase 
in  the  mean  squared  error  due  to  the  use  of  a Toeplitz  autocorrelation  estimate  of 
less  than  a factor  of  three.  As  was  the  case  in  Chapter  4,  the  additional  error  is 
almost  entirely  due  to  bias,  as  may  be  seen  from  the  comparison  of  the  variances  in 
Figure  9.2. 

Of  course,  the  reason  for  employing  a Toeplitz  autocorrelation  estimate  is  to 
reduce  the  time  required  to  estimate  the  frequency.  When  the  performance  of  the 
two  techniques  is  compared  on  this  basis,  we  can  see  that  the  fast  method  shows  a 
small  advantage  for  the  larger  autocorrelation  matrices,  and  is  competitive  even  for 
the  smallest  matrices  considered,  for  which  M — 33.  The  performance  as  a function 
of  computation  time  is  shown  in  Figure  9.3. 

Long  Data  Records 

The  accuracy  of  the  fast  Toeplitz  techniques  approaches  that  of  the  conven- 
tional variants  as  the  number  of  samples  in  the  data  record  becomes  large.  In  addition, 
the  efficiency  advantage  of  the  fast  variants  becomes  much  larger  as  M increases.  We 
would  expect,  then,  that  the  fast  variants  would  show  the  greatest  advantage  over 
the  conventional  implementations  when  long  data  records  are  used.  Figure  9.4  shows 
the  mean  squared  error  in  the  frequency  estimates  for  a such  a case,  where  the  signal 
parameters  are  ax  — 1,  uq  = 1.88496,  (f>x  - 0.3,  a2  = 1,  u/2  = 2.01062,  </>2  = —0.4,  and 
a 2 = 100.  The  record  length  L is  40000  samples. 


Mean  Squared  Error 
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Matrix  Dimension 


Figure  9.1:  Mean  Squared  Error  in  d>x  versus  Matrix  Dimension  M for  the 
Conventional  (solid  line)  and  Unbiased  Fast  Toeplitz  (dashed  line)  ESPRIT 

Methods,  for  the  low  SNR  case 
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Matrix  Dimension 


Figure  9.2:  Variance  in  o>1  versus  Matrix  Dimension  M for  the  Conventional  (solid 
line)  and  Unbiased  Fast  Toeplitz  (dashed  line)  ESPRIT  Methods,  for  the  low  SNR 

case 


Mean  Squared  Error 
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Computation  Time  (s) 


Figure  9.3:  Mean  Squared  Error  in  versus  Computation  Time  for  the 
Conventional  (solid  line)  and  Unbiased  Fast  Toeplitz  (dashed  line)  ESPRIT 

Methods,  for  the  low  SNR  case 
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Because  the  number  of  samples  is  large,  the  Toeplitz  estimate  of  the  autocor- 
relation is  very  close  to  the  covariance  estimate;  this  represents  a very  favorable  case 
for  the  fast  Toeplitz  techniques.  (The  signal-to-noise  ratio  is  also  low,  but  this  has  a 
much  smaller  effect  than  the  data  record  length.)  The  mean  squared  errors  in  the  es- 
timates produced  by  the  fast  and  conventional  variants  are  virtually  identical,  as  seen 
in  Figure  9.4;  the  mean  difference  is  less  than  5%.  (Because  conventional  methods 
require  a large  amount  of  computation,  the  conventional  estimates  were  computed 
only  up  to  M = 1025,  while  the  fast  variants  were  computed  up  to  M = 16385.) 
Both  estimates  have  negligible  bias,  and  due  to  the  low  signal-to-noise  ratio,  neither 
approaches  the  Bhattacharyya  bound. 

The  efficiency  advantage  of  the  fast  variants  becomes  compelling  for  M > 256. 
For  large  data  records,  the  fast  variants  make  it  possible  to  compute  eigendecompo- 
sitions  of  matrices  which  are  far  larger  than  the  largest  practical  with  conventional 
techniques,  and  so  the  performance  disadvantage  of  using  a Toeplitz  estimate  may  be 
overcome  by  increasing  M.  Figure  9.5  shows  the  mean  squared  error  as  a function  of 
execution  time,  where  the  dramatic  advantage  of  the  fast  techniques  is  apparent:  for 
the  same  execution  time,  the  mean  squared  error  of  the  fast  Toeplitz  estimate  is  as 
much  as  two  orders  of  magnitude  less  than  that  of  the  conventional  variant. 

The  computation  time  for  the  fast  variant  with  M — 16385  was  approximately 
the  same  as  that  of  the  conventional  estimate  for  M = 1025;  computing  a conventional 
estimate  for  M = 16385  would  have  required  four  thousand  times  more  computation. 
For  the  system  used  to  compute  these  examples,  a workstation  capable  of  approxi- 
mately 8 million  floating-point  operations  per  second,  the  fast  Toeplitz  estimate  with 
M = 16385  and  the  conventional  estimate  with  M — 1025  both  required  approxi- 
mately 10  minutes.  Computing  a conventional  estimate  with  M = 16385  would  have 
required  four  weeks,  and  1 gigabyte  of  memory  would  have  been  needed  for  packed 
storage  of  the  covariance  estimate. 
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Figure  9.4:  Mean  Squared  Error  in  u)1  versus  Matrix  Dimension  M for  the 
Conventional  (solid  line)  and  Unbiased  Fast  Toeplitz  (dashed  line)  ESPRIT 

Methods,  for  L = 40000 
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Figure  9.5:  Mean  Squared  Error  in  uj1  versus  Computation  Time  for  the 
Conventional  (solid  line)  and  Unbiased  Fast  Toeplitz  (dashed  line)  ESPRIT 

Methods,  for  L — 40000 
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The  large  data  record  case  for  which  the  fast  variant  shows  the  greatest  ad- 
vantage is  not  the  only  case  in  which  subspace  estimates  are  used,  but  it  is  certainly 
not  uncommon.  For  example,  the  40000  point  data  record  used  in  the  above  example 
would  be  less  than  one  second  of  CD-rate  digital  audio.  In  general,  if  we  examine 
the  situations  in  which  subspace  estimators  are  used,  we  can  see  that  there  are  two 
factors  which  limit  the  size  of  the  autocorrelation  matrix  estimate.  The  first  is  the 
length  of  the  input  data  record,  since  M must  be  no  greater  than  L;  the  second  is 
the  time  available  for  computation.  If  M must  be  small  because  the  data  record  is 
short,  the  fast  Toeplitz  approach  has  little  advantage,  although  its  performance  may 
still  be  quite  close  to  the  conventional  variant.  If  M is  limited  by  the  computation 
required,  however,  the  fast  techniques  can  improve  the  accuracy  of  the  frequency 
estimate  dramatically,  as  shown  by  the  preceding  example. 

A Less  Favorable  Case 

The  two  examples  above  show  instances  in  which  the  performance  of  the  fast 
variant  is  equal  to  or  superior  to  the  conventional  version.  To  form  a balanced  picture 
of  the  relative  performance  of  the  two  approaches,  it  is  also  interesting  to  examine 
a case  where  the  fast  variants  are  inferior.  This  case  has  signal  parameters  ax  = 1, 
ojx  — 1.88496,  (p1  = 0.3,  a2  = 1,  u2  — 2.01062,  (fi2  = —0.4,  a2  = 0.01,  and  a record 
length  L — 2000,  representing  a high  signal-to-noise  ratio  and  a relatively  short  data 
record.  In  contrast  to  the  earlier  cases,  this  is  close  to  the  worst  case  for  the  fast 
Toeplitz  methods.  The  mean  squared  error  is  shown  in  Figure  9.6.  (In  the  two 
previous  cases,  the  choice  of  Toeplitz  estimator  had  very  little  effect  on  the  results; 
since  the  choice  of  estimator  has  a larger  effect  in  this  and  the  following  cases,  the 
results  of  using  both  the  biased  and  unbiased  Toeplitz  autocorrelation  estimates  are 
shown.) 
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Figure  9.6:  Mean  Squared  Error  in  iu1  versus  Matrix  Dimension  for  the 
Conventional  (solid  line),  Biased  Fast  Toeplitz  (dotted  line),  and  Unbiased  Fast 
Toeplitz  (dashed  line)  ESPRIT  Methods,  for  the  high  SNR,  short  data  record  case 
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Although  the  mean  squared  error  in  the  fast  estimates  is  at  least  a factor  of 
ten  larger  than  the  conventional  estimate,  almost  all  of  the  increase  in  error  for  the 
fast  techniques  is  due  to  the  increased  bias  inherent  in  the  use  of  a Toeplitz  matrix. 
When  the  techniques  are  compared  on  the  basis  of  variance,  the  advantage  of  the 
conventional  approach  is  much  smaller,  and  actually  disappears  for  large  M,  as  seen 
in  Figure  9.7. 

As  before,  when  the  two  variants  are  compared  on  the  basis  of  execution  time, 
the  fast  variants  fare  better.  Figure  9.8  shows  the  mean  squared  error  versus  com- 
putation time;  comparison  on  this  basis  reduces  the  advantage  of  the  conventional 
method  by  a factor  of  approximately  1.5-2.  Because  the  variances  of  the  two  ap- 
proaches are  very  similar,  the  fast  variants  actually  have  an  advantage  when  the 
variance  is  compared  on  the  basis  of  execution  time.  This  comparison  is  shown  in 
Figure  9.9. 

Even  for  the  unfavorable  situation  considered  in  this  example,  the  fast  variants 
are  competitive  with  the  conventional  implementations  when  variance  of  the  frequency 
estimate,  rather  than  mean  squared  error,  is  the  primary  concern.  This  is  consistent 
with  the  behavior  of  the  subspace  techniques  with  Toeplitz  autocorrelation  estimates 
seen  in  Chapter  4. 

Closely  Spaced  Signals 

Finally,  we  will  examine  the  performance  of  the  two  variants  on  a signal  with 
two  very  closely  spaced  sinusoids.  The  input  signal  for  this  case  has  ax  = 1,  uj1  = 
1.88496,  01  = 0.3,  a2  = 1,  = 1.88621,  <p2  — —0.4,  with  a2  = 0.0001  and  L — 2000. 

The  frequency  spacing  between  the  two  signals  is  A / = 0.0002,  which  is  a factor  of 
2.5  smaller  than  the  classical  resolution  limit  for  L = 2000.  The  very  low  noise  level 
is  necessary  for  any  of  the  estimators  to  resolve  the  signals;  even  the  conventional 
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Figure  9.7:  Variance  in  u>1  versus  Matrix  Dimension  for  the  Conventional  (solid 
line),  Biased  Fast  Toeplitz  (dotted  line),  and  Unbiased  Fast  Toeplitz  (dashed  line) 
ESPRIT  Methods,  for  the  high  SNR,  short  data  record  case 
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Figure  9.8:  Mean  Squared  Error  in  versus  Computation  Time  for  the 
Conventional  (solid  line),  Biased  Fast  Toeplitz  (dotted  line),  and  Unbiased  Fast 
Toeplitz  (dashed  line)  ESPRIT  Methods,  for  the  high  SNR,  short  data  record  case 
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Figure  9.9:  Variance  in  uJi  versus  Computation  Time  for  the  Conventional  (solid 
line),  Biased  Fast  Toeplitz  (dotted  line),  and  Unbiased  Fast  Toeplitz  (dashed  line) 
ESPRIT  Methods,  for  the  high  SNR,  short  data  record  case 
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technique  does  not  resolve  them  for  higher  noise  powers.  The  mean  squared  error  as 
a function  of  matrix  dimension  is  shown  in  Figure  9.10. 

As  we  saw  in  Chapter  4,  the  use  of  a Toeplitz  estimate  reduces  the  resolution 
of  the  subspace  techniques  somewhat,  although  they  are  still  capable  of  resolving 
signals  closer  than  the  classical  resolution  limit.  The  behavior  of  estimates  computed 
using  the  two  Toeplitz  estimators  is  also  similar  to  that  seen  earlier;  it  is  necessary 
for  M to  be  approximately  L/2  before  the  unbiased  estimator  resolves  the  signals, 
at  which  point  it  abruptly  transitions  to  much  higher  accuracy.  The  biased  Toeplitz 
estimator  resolves  the  signals  at  all  values  of  M,  and  in  fact  has  lower  variance  than 
the  covariance  estimator,  as  seen  in  Figure  9.11. 

As  would  be  expected,  the  fast  techniques  compare  better  with  the  conven- 
tional variant  when  computation  time  is  used  as  the  basis  for  comparison.  The  vari- 
ance of  the  frequency  estimate  versus  computation  time  is  shown  in  Figure  9.12. 

The  frequency  estimation  accuracy  of  the  fast  Toeplitz  ESPRIT  algorithm  is 
extremely  similar  to  the  accuracy  of  Toeplitz-based  estimators  examined  in  Chapter  4; 
it  appears  that  the  only  losses  in  performance  are  those  due  to  the  use  of  a Toeplitz 
estimate.  No  loss  of  accuracy  due  to  numerical  instability  of  the  fast  or  superfast 
algorithms  for  Toeplitz  eigendecomposition  has  been  observed,  and  the  residual  tests 
described  in  Chapter  7 verify  that  loss  of  accuracy  due  to  numerical  instability  is  an 
extremely  rare  event. 
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Figure  9.10:  Mean  Squared  Error  in  versus  Matrix  Dimension  for  the 
Conventional  (solid  line),  Biased  Fast  Toeplitz  (dotted  line),  and  Unbiased  Fast 
Toeplitz  (dashed  line)  ESPRIT  Methods,  for  L = 2000  and  A / = 0.0002 
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Figure  9.11:  Variance  in  u1  versus  Matrix  Dimension  for  the  Conventional  (solid 
line),  Biased  Fast  Toeplitz  (dotted  line),  and  Unbiased  Fast  Toeplitz  (dashed  line) 
ESPRIT  Methods,  for  L = 2000  and  A / = 0.0002 
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Figure  9.12:  Variance  in  u1  versus  Computation  Time  for  the  Conventional  (solid 
line),  Biased  Fast  Toeplitz  (dotted  line),  and  Unbiased  Fast  Toeplitz  (dashed  line) 
ESPRIT  Methods,  for  L = 2000  and  A / = 0.0002 


CHAPTER  10 
CONCLUSIONS 


The  preceding  chapters  have  attempted  to  present  a balanced  assessment  of 
the  fast  Toeplitz  techniques  for  frequency  estimation.  We  have  seen  that,  when  a 
large  input  data  record  is  available,  these  new  variants  of  the  subspace  methods  can 
greatly  reduce  the  computation  time  required  for  subspace  frequency  estimation.  It 
is  also  possible  in  many  circumstances  to  obtain  a more  accurate  estimate  in  the 
same  amount  of  time,  by  using  the  efficiency  improvement  from  the  fast  algorithm  to 
increase  the  size  of  the  autocorrelation  matrix,  rather  than  reduce  the  computation 
time. 

Although  these  techniques  appear  quite  promising,  it  is  important  to  keep 
in  mind  that  the  utility  of  any  new  method  can  be  only  established  by  applying  it 
to  a variety  of  actual  problems;  assessing  the  promise  of  a technique  on  the  basis 
of  its  performance  in  situations  designed  to  maximize  its  advantages  and  minimize 
disadvantages  can  only  give  a distorted  view.  To  ensure  that  the  possible  shortcomings 
of  this  method  are  fully  discussed,  a review  of  the  method  from  a critical  perspective 
is  prevented  below. 


A Critical  Review 

The  approach  described  here  dramatically  reduces  the  computation  required 
to  compute  eigenvalues  and  eigenvectors  of  large  Toeplitz  matrices,  making  it  possible 
to  compute  subspace  frequency  estimates  much  more  rapidly.  The  use  of  a Toeplitz 
estimate  of  the  autocorrelation  is  essential  to  the  efficiency  of  this  technique,  however, 
it  is  also  the  cause  of  its  most  important  limitations. 
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The  most  serious  difficulty  with  this  method  is  that,  for  short  or  moderate 
data  record  lengths,  the  use  of  a Toeplitz  estimate  degrades  the  performance  of  the 
subspace  frequency  estimators  somewhat.  The  degradation  includes  an  increase  in  the 
bias  of  the  frequency  estimates,  and,  for  small  autocorrelation  matrices,  a reduction 
of  the  ability  to  resolve  very  closely  spaced  signals.  In  order  to  preserve  the  high 
resolution  of  the  subspace  techniques,  it  is  necessary  either  to  accept  the  relatively 
high  bias  associated  with  the  use  of  the  biased  estimator,  or  to  use  the  unbiased 
estimator  with  a matrix  dimension  of  at  least  half  the  length  of  the  data  record. 

The  loss  of  performance,  in  comparison  with  the  standard  subspace  methods, 
is  most  severe  when  the  signal-to-noise  ratio  is  high  and  the  data  records  are  short. 
In  many  circumstances,  it  is  possible  to  make  up  for  this  degradation  by  using  a 
larger  autocorrelation  matrix,  and  when  computation  time,  rather  than  data  record 
length,  is  the  limiting  factor,  the  fast  methods  can  give  much  better  accuracy  than 
conventional  methods.  However,  the  results  would  be  much  more  satisfying  if  the 
performance  penalty  could  be  eliminated  completely,  by  developing  a fast  method 
which  can  be  used  with  covariance  estimates.  Two  possible  approaches  for  doing  this 
are  outlined  below.  Because  of  the  performance  loss  inherent  in  the  use  of  a Toeplitz 
estimator,  the  current  technique  is  attractive  principally  in  cases  where  long  data 
records  are  available,  so  that  the  performance  of  the  Toeplitz  algorithms  are  close  to 
the  standard  versions. 

Another  limitation  of  this  method  concerns  its  application  to  a slightly  different 
problem:  in  many  cases,  identical  algorithms  may  be  used  for  frequency  estimation 
and  for  the  closely  related  problem  of  array  bearing  estimation.  Although  this  is 
also  true  for  the  algorithms  presented  here,  there  are  several  restrictions.  First,  the 
requirement  that  a Toeplitz  autocorrelation  matrix  be  used  limits  the  applicability  of 
this  technique  to  uniformly  spaced  linear  arrays.  The  standard  versions  of  the  MUSIC 
and  ESPRIT  estimators  are  useful  for  a much  broader  range  of  array  geometries. 
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A more  serious  limitation  arises  from  purely  practical  considerations.  In  the 
frequency  estimation  problem,  the  upper  limit  to  the  size  of  the  autocorrelation  matrix 
is  set  by  the  length  of  the  data  record;  in  the  bearing  estimation  problem,  the  limit 
is  set  by  the  size  of  the  array.  While  data  records  with  thousands  of  samples  are 
extremely  common,  uniform  linear  arrays  with  thousands  of  elements  are  rare.  There 
are  very  few  real  arrays  in  existence  large  enough  for  these  algorithms  to  offer  a 
significant  advantage,  and  while  synthetic  aperture  techniques  can  easily  produce 
arrays  with  very  large  numbers  of  synthesized  elements,  synthetic  aperture  systems 
are  principally  used  for  imaging  distributed  targets,  not  for  the  point  target  bearing 
estimation  problem  that  the  subspace  algorithms  solve. 

Areas  for  Further  Research 

The  work  described  here  touches  on  many  fields  of  current  research,  and  has  led 
to  several  promising  avenues  which  will  bear  further  investigation.  A brief  summary 
of  several  of  the  most  important  is  given  in  the  sections  below. 

Parallel  Implementations 

A very  important  aspect  of  the  procedures  discussed  in  this  work  is  their 
inherent  parallelism.  The  computation  of  eigenvalues  and  eigenvectors  proceeds  in 
two  phases:  calculation  of  bracket  intervals,  and  location  of  eigenvalues;  each  of  these 
phases  is  easily  performed  in  parallel.  The  bracket  computation  requires  solution 
of  the  Yule- Walker  equation  defined  by  T — AI  over  a range  of  values  of  A.  By 
dividing  the  range  of  values  for  A into  subranges,  and  assigning  each  subrange  to 
a different  processor,  the  bracket  computation  can  be  performed  in  parallel.  The 
interprocessor  communication  required  is  very  small,  since  the  vector  which  defines  T 
can  be  broadcast  to  all  processors  on  initialization,  and  the  only  further  information 
needed  by  each  processor  is  a range  of  A in  which  to  operate. 
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Once  bracket  intervals  for  the  desired  eigenvalues  have  been  established,  the 
computation  of  each  eigenvalue  and  the  associated  eigenvector  can  also  proceed  in 
parallel,  with  one  processor  assigned  to  each  eigenvalue.  If  at  least  N processors 
are  available,  then  the  computation  of  N eigenvalues  and  eigenvectors  of  an  M x M 
Toeplitz  matrix  can  be  performed  in  0(M  log2  M)  time.  In  addition,  each  computa- 
tion is  completely  independent,  so  no  interprocessor  communication  is  required. 

In  contrast  to  standard  methods  for  parallel  eigenvalue  computation,  such  as 
the  parallel  Jacobi  method  [37,97],  eigenvalue  and  eigenvector  computation  with  the 
fast  Toeplitz  methods  is  very  coarse-grained,  and  requires  little  interprocessor  com- 
munication. These  characteristics  are  a great  practical  advantage,  because  they  allow 
the  algorithm  to  be  efficiently  implemented  on  a wide  range  of  common  multiprocessor 
systems,  including  both  SIMD  and  MIMD  systems,  and  make  its  efficiency  relatively 
insensitive  to  the  details  of  interprocessor  communication  on  a particular  system. 
Furthermore,  considering  the  advantages  offered  by  DSP  processors  for  computing 
the  FFTs  used  in  the  superfast  techniques,  and  the  recent  appearance  of  DSP  pro- 
cessors designed  for  parallel  applications,  it  is  clear  that  the  methods  described  here 
are  promising  candidates  for  parallel  processing  systems  for  frequency  estimation. 

Maximum-Likelihood  Toeplitz  Estimation 

In  addition  to  the  well-known  methods  of  autocorrelation  estimation  described 
in  Chapter  4,  a family  of  new  methods  has  recently  been  developed.  The  covariance 
method  produces  the  maximum  likelihood  estimate  of  the  autocorrelation  matrix 
under  the  stochastic  signal  model,  when  no  constraints  are  imposed  on  the  structure 
of  R.  However,  the  autocorrelation  of  any  signal  consistent  with  the  stochastic  signal 
model  is  known  to  be  Toeplitz,  and  this  fact  can  presumably  be  used  to  produce  a more 
accurate  estimate.  Methods  for  computing  the  maximum  likelihood  Toeplitz  estimate 
of  the  autocorrelation  were  first  discussed  by  Burg,  Luneberger,  and  Wegner  [98], 
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whose  paper  has  sparked  several  other  workers  to  consider  the  problem  of  constrained 
maximum  likelihood  estimation  of  R [99, 100]. 

The  maximum  likelihood  estimate  is  found  by  maximizing  the  function 

f(Rr)  ~ ~~  l°g(det(Rr))  - trace(Rr1Rc), 

where  Rj,  is  a Toeplitz  estimate,  and  Rc  is  the  covariance  estimate.  The  maximum 
likelihood  estimate  is  the  value  of  Ry  which  maximizes  /.  Notice  that  this  maxi- 
mization requires  computing  the  determinant  and  inverse  of  a Toeplitz  matrix,  and 
the  product  of  the  inverse  of  a Toeplitz  matrix  with  Rc;  efficient  algorithms  for  these 
operations  were  given  in  Chapter  5.  Using  these  algorithms,  / can  be  evaluated  in 
0(M 2 log  M)  operations. 

This  work  has  not  attempted  a comprehensive  study  of  the  performance  of 
the  Toeplitz  maximum  likelihood  method  for  estimation  of  sinusoidal  frequencies. 
However,  it  should  be  noted  that,  in  contrast  to  the  very  promising  results  reported 
by  Williams  [100]  for  the  array  bearing  estimation  problem,  a series  of  simulations 
for  the  frequency  estimation  problem  show  that  the  maximum  likelihood  Toeplitz 
estimator  has  performance  similar  to  the  unbiased  Toeplitz  estimator,  which  is  much 
simpler  to  compute.  Nevertheless,  the  fast  techniques  described  in  Chapter  5 may 
be  useful  for  improving  the  efficiency  of  calculating  the  maximum  likelihood  Toeplitz 
estimator  in  other  contexts. 

Displacement  Rank  Algorithms  for  Covariance  Matrices 

A group  of  workers  associated  with  Kailath  have  developed  algorithms  for  the 
solution  of  matrix  equations  in  which  the  matrix  has  low  displacement  rank,  that  is, 
where  the  rank  of  A — DADT  is  low  [49,101-103].  Here  D is  the  “downshift”  matrix, 
with  ones  on  its  first  subdiagonal,  and  zeros  elsewhere;  the  operation  above  subtracts 
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the  largest  leading  principal  submatrix  of  A from  its  lower  right  corner.  Toeplitz 
matrices  are  one  example  of  matrices  with  low  displacement  rank;  the  displacement 
rank  of  a Toeplitz  matrix  is  at  most  two.  Other  related  matrices,  such  as  inverses  and 
products  of  Toeplitz  matrices,  may  also  be  shown  to  have  low  displacement  rank. 

The  displacement  rank  approach  produces  an  algorithm  for  solving  an  equa- 
tion with  a matrix  of  displacement  rank  p in  0(pM2)  operations.  The  covariance 
estimate  of  R is  a product  of  Toeplitz  matrices,  which  has  displacement  rank  3;  a 
matrix  of  the  form  R — AI  has  displacement  rank  4,  and  R — AS,  where  S is  Toeplitz, 
has  displacement  rank  5.  By  using  displacement  rank  algorithms  in  eigenvalue  com- 
putations similar  to  those  performed  here,  it  should  be  possible  to  develop  0(NM 2) 
methods  for  finding  N eigenvectors  of  a covariance  matrix.  This  would  remove  the 
performance  limitation  caused  by  the  use  of  a Toeplitz  matrix. 

Toeplitz  Matrices  as  “Sparse”  Matrices 

A sparse  matrix  is  one  with  few  nonzero  elements;  from  an  algorithmic  point 
of  view,  the  distinguishing  characteristics  of  a sparse  matrix  are  that  it  requires  much 
less  than  (D(M2)  storage,  and  that  matrix-vector  multiplication  requires  less  than 
0(M2)  operations.  A wide  variety  of  sparse  matrix  algorithms  has  been  developed  to 
take  advantage  of  these  characteristics.  From  our  examination  of  Toeplitz  matrices,  it 
is  clear  that  they  share  the  algorithmic  characteristics  of  sparse  matrices:  they  require 
O(M)  storage,  and  a matrix-vector  multiplication  may  be  performed  in  0(M  log  M) 
operations.  This  suggests  that  existing  sparse  matrix  algorithms  may  be  well-suited 
for  application  to  Toeplitz  matrices  as  well. 

For  the  frequency  estimation  problem,  it  is  necessary  to  calculate  a few  of  the 
largest  or  smallest  eigenvalues  and  eigenvectors  of  a Toeplitz  matrix.  The  Lanczos 
algorithm  computes  a tridiagonal  matrix  whose  extremal  eigenvalues  usually  converge 
to  those  of  the  input  matrix  in  0(y/~M)  iterations  [78],  and  each  iteration  requires 
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one  matrix-vector  multiplication.  If  this  is  performed  with  the  FFT,  each  iteration 
requires  0(M  log  M)  operations,  and  the  overall  algorithm  is  0(M3//2  log  M).  This 
asymptotic  complexity  is  less  than  the  fast  algorithms  discussed  here,  but  greater 
than  the  superfast  algorithms;  one  potential  advantage  is  that  the  only  computation 
performed  is  matrix-vector  multiplication,  which  is  always  stable.  Like  the  algorithms 
developed  here,  the  Lanczos  algorithm  also  is  easily  extended  to  the  generalized  eigen- 
value problem.  In  addition,  since  the  fast  methods  for  Toeplitz  multiplication  extend 
easily  to  the  computation  of  the  product  of  a covariance  matrix  and  a vector,  the 
Lanczos  method  provides  another  approach  to  a fast  covariance  eigensolver. 

The  use  of  the  Lanczos  algorithm  to  compute  extremal  eigenvalues  of  Toeplitz 
matrices  was  apparently  first  suggested  by  Ikramov  [104],  In  contrast  to  the  Cybenko- 
Van  Loan  approach  developed  here,  which  has  been  pursued  by  several  researchers, 
very  little  further  study  of  the  Lanczos  approach  has  been  performed;  only  a brief 
work  by  Huckle  [105],  and  a recent  paper  by  Xu  and  Kailath  [106]  have  addressed 
this  question. 

Limits  on  Frequency  Estimation 

The  Bhattacharyya  bounds  developed  in  Chapter  2 are  the  tightest  known 
bounds  for  sinusoidal  frequency  estimation.  They  are  still,  however,  smaller  than  the 
variance  of  all  known  frequency  estimation  algorithms  in  the  threshold  region,  that 
is,  for  short  data  records,  low  signal-to-noise  ratios,  and  closely  spaced  sinusoids.  The 
fact  that  several  different  techniques,  including  exact  maximum  likelihood,  MUSIC, 
and  ESPRIT,  all  show  very  similar  behavior  in  the  threshold  region  is  intriguing:  it 
suggests  two  possibilities.  First,  it  is  possible  that  these  apparently  very  different 
techniques  share  some  common  feature  that  limits  their  performance  in  the  threshold 
region;  in  this  case,  identifying  this  feature  might  lead  to  estimation  techniques  with 
better  threshold  performance. 
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The  second  possibility  is  that  the  estimators’  performance  in  the  threshold 
region  is  limited  by  an  as  yet  unknown  bound.  The  second-order  Bhattacharyya 
bounds  developed  here  may  be  extended  to  higher  orders,  resulting  in  tighter  bounds; 
although  it  is  probably  impractical  to  compute  all  the  elements  of  the  matrix  which  de- 
fines the  third-order  bound,  it  may  be  feasible  to  add  selected  higher  order  derivatives 
to  the  vector  used  to  compute  the  second-order  bound.  If  tighter  bounds  computed 
in  this  manner  show  that  the  performance  of  the  best  currently  known  frequency 
estimators  is  close  to  optimal,  this  would  have  strong  implications  for  the  develop- 
ment of  future  algorithms,  shifting  the  focus  of  development  away  from  increasing 
performance  and  toward  enhancing  efficiency. 

In  conclusion,  it  is  pleasing  to  note  that  the  approach  described  here  for  im- 
proving the  efficiency  of  subspace  frequency  estimators  appears  not  only  to  have 
useful  application  to  situations  where  large  amounts  of  data  are  available,  but  has 
also  pointed  the  way  to  several  new  approaches  for  solving  related  problems.  It  is 
hoped  that  the  pursuit  of  these  new  approaches  is  as  fruitful  as  the  investigation 
described  here. 


APPENDIX  A 

REVIEW  OF  MATRIX  ALGEBRA 


In  this  appendix,  a brief  review  of  several  important  facts  from  matrix  algebra 
will  be  given.  To  avoid  the  necessity  of  repeating  the  characteristics  of  the  matrix  for 
each  fact,  we  will  always  assume  that  x is  a vector  of  length  n,  whose  elements  are 
denoted  x(i),  and  A and  B are  n x n matrices,  whose  entries  are  denoted  A(i,j ) or 

B(iJ )■ 


Norms 

Norms  are  used  to  measure  the  size  of  a matrix  or  vector.  The  most  common 
vector  norms  are: 


llxlli 

IMIa 

IMloo 


1=1 


\ £K0l2> 

N *=i 

max  |x(i)|. 


Each  vector  norm  ||x||Q  can  also  be  used  to  define  a matrix  norm: 


|Ax| 


max  -n — r 
x^O  X 


In  some  cases,  a matrix  norm  may  be  easily  computed  from  the  matrix  itself: 


|A||i  = max^|A(»,j)| 


i « = 1 
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AIL  = 

I|a||f  = 

The  norm  ||A||F  is  the  Frobenius  norm,  and  is  not  derived  from  a vector  norm.  The 
norm  ||A||2  does  not  have  a simple  expression  in  terms  of  the  elements  of  A. 

A norm  UAL  is  unitarily  invariant  if,  for  any  unitary  matrices  U and  V of 
the  proper  dimensions, 

||UAV||„  = ||A||„. 

Special  Matrices 

A matrix  is  Hermitian  if,  for  every  element,  A(i,j ) = A(j,  i)*,  where  * denotes 
complex  conjugation;  a matrix  is  symmetric  if  all  of  its  entries  are  real  and  A(i,j)  — 
A(j,i).  All  symmetric  matrices  are  Hermitian. 

A Hermitian  matrix  is  positive  definite  if,  for  all  nonzero  vectors  x,  xffAx  > 0; 
it  is  positive  semidefinite  if  xffAx  > 0 for  all  x.  All  of  the  eigenvalues  and  all  of  the 
diagonal  entries  of  a positive  definite  (positive  semidefinite)  matrix  are  greater  than 
(greater  than  or  equal  to)  zero. 

A Toeplitz  matrix  is  one  for  which  A(i,j)  = A(k,l)  whenever  i — j = k — l.  A 
Toeplitz  matrix  is  completely  determined  by  its  first  row  and  its  first  column. 

A unitary  matrix  is  one  for  which 


Ah  = A-1. 
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Matrix  Eigendecomposition 

The  eigenvalues  \ of  a matrix  are  the  roots  of  the  characteristic  polynomial 
det(AI  — A);  there  are  at  most  n distinct  eigenvalues.  The  eigenvectors  of  a matrix 
are  the  nonzero  vectors  ei  that  satisfy 


where  Xi  is  an  eigenvalue  of  A;  there  are  at  most  n linearly  independent  eigenvectors. 

The  eigenvectors  of  a matrix  are  not  unique,  since  if  ei  is  an  eigenvector,  then 
for  any  nonzero  constant  C,  Ce-  is  also  an  eigenvector.  By  convention,  this  ambiguity 
is  resolved  by  normalizing  each  eigenvector  so  that  its  length  is  one. 

An  invariant  subspace  of  a matrix  A is  a space  S such  that  if  x € S,  then 
Ax  € S.  An  eigenvector  defines  a one-dimensional  invariant  subspace. 

If  A has  repeated  eigenvalues,  that  is,  if  its  characteristic  polynomial  has 
repeated  roots,  then  the  normalized  eigenvectors  associated  with  the  repeated  eigen- 
values are  not  unique,  since  if  and  e • are  eigenvectors  associated  with  a repeated 
eigenvalue,  then  any  (normalized)  linear  combination  of  e-  and  e ■ is  also  an  eigenvec- 
tor. 

If  A is  Hermitian,  then  all  of  its  eigenvalues  are  real;  if  A is  symmetric,  then 
both  its  eigenvalues  and  eigenvectors  are  real.  In  either  case,  n orthonormal  eigen- 
vectors of  A may  always  be  found;  but  the  eigenvectors  associated  with  repeated 
eigenvalues  are  not  uniquely  determined. 

The  eigenvalues  of  a matrix  may  be  shifted  by  any  value  a,  without  changing 
the  eigenvectors,  by  adding  a multiple  of  I to  the  matrix,  since 


(A  + aI)ei  = (A^  + aOe.. 
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The  product  of  the  eigenvalues  of  a matrix  is  equal  to  its  determinant,  and  the 
sum  of  the  eigenvalues  is  equal  to  its  trace,  that  is,  to  the  sum  of  its  diagonal  entries. 

If  X is  n x n and  nonsingular,  then  A and  X”1  AX  have  the  same  eigenvalues; 
A and  X_1AX  are  similar. 

The  inertia  of  a Hermitian  matrix  is  the  number  of  positive,  zero,  and  negative 
eigenvalues;  if  X is  n x n and  nonsingular,  then  A and  XHAX  have  the  same  inertia. 
All  of  the  eigenvalues  of  A lie  in  the  regions  defined  by 

I A(i,i)  - \\<Y,  1 

j^i 

this  is  known  as  Gershgorin’s  circle  theorem. 

If  A is  Hermitian,  A is  an  — lxn  — 1 principal  submatrix  of  A,  and  the 
eigenvalues  of  A and  A are  <r1  > a2  > ...  > an_  : and  At  > A2  > ...  > A„, 
respectively,  then  At  > at  > A2  > a2  > . . . An_j  > crn_1  > An. 

Generalized  Eigendecomposition 

The  generalized  eigenvalues  of  a pair  of  matrices  A and  B are  the  n roots 
of  the  polynomial  det(AB  — A).  A pair  of  matrices  (A,  B)  is  often  referred  to  as  a 
matrix  pencil. 

The  generalized  eigenvectors  of  the  matrix  pencil  (A,  B)  are  the  nonzero  vec- 
tors satisfying 

Ae{  = A-Be-, 

where  A^  is  a generalized  eigenvalue. 

A generalized  invariant  subspace  (sometimes  called  a deflating  subspace)  of 
a pencil  (A,  B)  is  a space  S such  that  if  Bx  G S,  then  Ax  G S.  A generalized 
eigenvector  defines  a one-dimensional  generalized  invariant  subspace. 
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If  both  A and  B are  Hermitian,  and  there  exists  a matrix  C = cvA  4-  (1  — a)B 
that  is  positive  definite  for  some  a between  0 and  1,  inclusive,  then  the  pencil  (A,  B) 
is  definite.  Notice  that  the  pencil  is  definite  if  either  A or  B is  positive  definite. 

The  generalized  eigenvalues  of  a definite  pencil  are  real,  and  n generalized 
eigenvectors  may  always  be  found  that  are  linearly  independent  and  orthogonal  with 
respect  to  the  inner  product  defined  by  B,  that  is, 


APPENDIX  B 

ROUTINES  FOR  FAST  SOLUTION  OF  TOEPLITZ  SYSTEMS 

Introduction 

This  section  contains  Fortran-90  subroutines  which  implement  a superfast  al- 
gorithm for  the  solution  of  Toeplitz  systems.  The  subroutines  employ  a variant  of 
the  algorithm  described  by  Ammar  and  Gragg  [45].  The  algorithm  has  been  modified 
in  the  following  ways:  the  sign  of  the  input  a has  been  reversed,  and  the  forward 
Fourier  transform  is  used  in  place  of  the  inverse  Fourier  transform,  and  vice  versa. 
The  algorithm  presented  here  computes  the  solution  of  Tna  = — t,  where  Tn  is  a 
n x n symmetric  Toeplitz  matrix,  with  n = 2k,  using  8 n log2  n + 8n  log  n + 0(n)  real 
arithmetic  operations. 

The  only  features  of  the  Fortran-90  language  used  in  these  subroutines  which 
are  not  also  legal  in  Fortran-77  are  the  do . . . enddo  construct,  and  recursion.  Both 
of  these  features  are  widely  supported  extensions  to  the  Fortran-77  language,  so  these 
subroutines  will  compile  properly  on  many  Fortran-77  systems  as  well. 

These  subroutines,  and  those  in  the  other  appendices  which  contain  computer 
programs,  were  written  using  the  WEB  programming  system  [107].  This  system  allows 
both  the  source  code  and  documentation  to  be  written  into  a single  combined  source 
file,  which  is  processed  to  generate  the  documentation  and  the  program  source  code. 
This  ensures  that  the  documentation  matches  the  program  exactly. 


194 


195 


Listings 


1.  Subroutine  ammar  is  a driver  routine  that  has  the  same  parameters  as  the 
Levinson-Durbin  algorithm:  r is  the  autocorrelation,  a the  solution,  e the  prediction 
errors,  and  n the  size  of  the  problem.  Note  that  the  n input  to  ammar  is  the  length 
of  the  autocorrelation  sequence,  which  is  one  greater  than  the  size  of  the  matrix  to 
be  solved. 


The  WEB  system  provides  a macro  facility  similar  to  the  C language  preproces- 
sor’s #def  ine.  This  remedies  a serious  defect  in  the  Fortran  language.  We  will  use  the 
macro  facility  to  define  the  largest  allowable  matrix;  this  value  is  needed  to  properly 
dimension  arrays  in  several  program  units.  The  constant  NMAX  is  the  maximum 
value  of  n. 

@m  NMAX  23 * * * * * * * * * * 14 


2.  The  macro  facility  will  also  be  used  to  simplify  changing  the  precision  of  the 
program.  A single  precision  version  may  be  produced  by  redefining  FLOAT  to  real, 
FLT(x)  to  real  (a:),  and  the  constants  to  real  numbers. 

@m  FLOAT  double  precision 
@m  FLT(x)  dble(x) 

@m  ZERO  0.  • 10°D 
@m  ONE  1.  • 10°D 


3.  The  subroutine  ammar  itself  checks  the  validity  of  its  arguments,  performs  the 

generalized  Schur  algorithm,  and  returns  the  solution  to  the  Yule- Walker  problem 

Ta  = — t in  a.  If  the  solution  to  the  general  symmetric  Toeplitz  system  Tx  = b 

is  required,  a may  be  used  in  the  Gohberg-Semencul  formula  to  calculate  T_1,  as 
described  in  Chapter  5. 

subroutine  ammar(r,a,e,n ) 

integer  n 

FLOATr{n  + 1),  a(n),  e(n  + 1) 

(Declare  ammar' s local  variables.  4) 

(Check  that  n is  valid.  5) 

( Solve  the  Yule- Walker  problem.  6 ) 

return 

end 
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4.  The  only  local  variables  used  are  u and  v,  which  store  the  Fourier  transforms 
of  the  vectors  produced  by  the  Schur  algorithm.  The  algorithm  as  a whole  requires 
O(NMAX)  storage. 

( Declare  ammar' s local  variables.  4 ) = 
integer  iex,  i 

FLOAT  u(NM AX), v(NMAX) 

This  code  is  used  in  section  3. 


5.  The  input  value  of  n is  checked  for  vaildity. 

( Check  that  n is  valid.  5 ) = 

iex  — nint(log(FLT(n))/log(FLT(  2))) 
if  (2iex  # n)  then 

stop  ’ Error  :uiiuniustubeuauPOweruofu2uinuaii™arO  . ’ 

endif 

if  (n  < 1)  then 

stop  ’ Error  :unuisulessuthanLj2|Jiiiu=Lmmar()  • ’ 

endif 

if  (n  > NMAX ) then 

stop  ’Error  :unuisutooulargeuinuammcirO  . ’ 

endif 

This  code  is  used  in  section  3. 


6.  The  generalized  Schur  algorithm  is  used  to  solve  the  Yule-Walker  equations.  One 
of  the  differences  between  this  implementation  and  that  of  Ammar  and  Gragg  is  that 
the  sign  of  a has  been  reversed.  This  allows  the  input  vector  r to  be  used  directly  as 
input  to  the  subroutine  that  implements  the  algorithm. 


In  this  section  and  those  below,  a matrix  notation  will  be  used  to  describe  the 
calculations.  The  matrices  Fn  and  F"1  represent  the  forward  and  inverse  discrete 
Fourier  transforms,  respectively;  Sn  and  Cn  represent  noncyclic  and  cyclic  downward 
shift  matrices,  and  [a,  b]  represents  a matrix  formed  from  the  column  vectors  a and 
b.  In  this  notation,  the  first  portion  of  ammar  performs  the  operation 

a = F;1u  + SF-1v. 


These  operations  gould  be  performed  as  shown  above,  however,  one  inverse  FFT  can 
be  saved  by  combining  the  vectors  in  the  transform  domain.  This  is  possible  because 
the  first  coefficient  of  F“x  v is  always  one.  If  the  Fourier  transform  of  a vector  with  one 
as  its  first  entry  and  zeros  elsewhere  is  subtracted  from  v,  the  resulting  vector  may 
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now  be  circularly  shifted  to  yield  the  desired  result.  These  operations  are  performed 
by  the  subroutine  fixv. 


The  last  step  is  the  calculation  of  the  final  prediction  error  e(n). 

( Solve  the  Yule- Walker  problem.  6 } = 

call  gsa(r(2),  r(l),  u,  v,  e,  n) 
call  fixv(v,n ) 
do  i = 1,  n 

a(i)  = u(i)  + v(i) 

enddo 

call  irfft(a,n ) 

e(n  + 1)  = ( ONE  — a(n))*(ONE  + a(n))*e(n) 

This  code  is  used  in  section  3. 


7.  Subroutine  gsa  is  the  heart  of  the  implementation;  it  is  the  only  recursive  routine 
required.  The  keyword  recursive  is  required  by  standard  Fortran-90;  it  may  also  be 
required  by  some  Fortran-77  compilers. 


Since  the  ordinary  (nonrecursive)  Schur  algorithm  is  more  efficient  for  small 
n,  it  is  used  once  the  problem  size  falls  below  a threshold.  The  size  at  which  the 
recursive  (split)  algorithm  becomes  faster  is  given  by  the  macro  N SPLIT. 

@m  NSPLIT  64 

recursive  subroutine  gsa(a,b,u,v,e,n ) 
integer  n 

FLOAT a(n),b(n),  u(n),  v(n),  e(n  + 1) 

( Declare  gsa' s local  variables.  8 ) 
if  (n  < NSPLIT ) then 
call  schur(a,b,u,v,e,n ) 

else 

(Solve  using  the  generalized  Schur  algorithm.  9) 

endif 

return 

end 


8.  The  subroutine  gsa  requires  local  storage  proportional  to  NMAX . Since  the 
subroutine  is  recursive,  the  program  must  be  designed  so  that  local  variables  do  not 
overwrite  themselves  on  successive  calls.  There  are  three  general  methods  to  ensure 
this.  The  first  is  to  allocate  local  variables  on  the  program  stack;  this  requires  a very 
large  stack,  and  it  is  generally  impossible  for  the  program  to  determine  if  enough  stack 
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space  is  available  at  run  time.  The  second  option  is  to  dynamically  allocate  memory; 
this  is  standard  if  Fortran-90,  but  there  is  no  standard  syntax  for  it  even  in  those 
Fortran-77  compilers  that  support  dynamic  allocation.  The  third  option  is  to  use 
standard  statically  allocated  arrays,  and  a slightly  more  complex  indexing  scheme,  to 
ensure  that  each  level  of  recursive  execution  uses  different  memory  locations.  This  is 
the  approach  adopted  here. 


The  arrays  that  are  reused  by  recursive  calls  are  first  listed  in  a save  statement, 
to  prevent  Fortran-90  compilers  from  allocating  them  on  the  stack.  Then,  each  array 
is  declared  to  be  twice  the  size  required  by  the  highest  level  subroutine.  On  each 
recursive  call,  for  a workspace  size  of  k,  the  memory  locations  k + 1 to  2k  are  used, 
ensuring  that  each  level  of  recursion  uses  different  memory  locations.  For  example,  if 
the  number  of  memory  locations  required  was  8,  the  first  level  subroutine  call  would 
use  locations  9 through  16,  the  second  level  recursive  calls  would  use  locations  5 
through  8,  and  so  on. 

( Declare  gsa' s local  variables.  8 ) = 

save  p,  q , r,  s,  c,  d , pi , ql , al , bl , tl , t2 , xl , yl , ul , vl 
integer  i,  m,  npl , mpl 

FLOATp(2*NMAX),q(2*NMAX),r(2*NMAX),s{2*NMAX) 

FLOAT  xl  (NMAX/2),  yl  (NMAX/  2),  ul  ( NMAX ),  vl  ( NMAX ) 

FLOAT c(NMAX),  d(NMAX ),  pi  (2 *NMAX),  ql  (2 *NMAX) 

FLOAT  al  ( NMAX),bl  ( NMAX),tl  ( NMAX), t2  (NMAX ) 

This  code  is  used  in  section  7. 


9.  Each  step  of  the  generalized  Schur  algorithm  makes  two  recursive  calls  to  solve 
problems  of  half  the  size,  and  performs  O(nlogn)  additional  computation,  leading 
to  an  overall  0(nlog2n)  computational  load.  The  symbols  used  for  the  variables  are 
those  used  in  [45]  with  two  exceptions,  noted  below. 

(Solve  using  the  generalized  Schur  algorithm.  9)  = 

m = n/2 
mpl  =m-|-l 
npl  =n  + l 

call  gsa(a,b,ul  ( mpl  ),vl  ( mpl  ),e,m) 

( Compute  p and  q.  10 } 

( Compute  r and  s.  11 ) 

( Compute  c and  d.  12 ) 

(Compute  a:  and  br  13) 

call  gsa{al  ( mpl  ),bl  ( mpl  ),ul  ( mpl  ),vl  ( mpl  ),e(mpl  ),m) 

(Compute  and  qr  14) 

( Compute  u and  v.  15 ) 

This  code  is  used  in  section  7. 
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10.  After  the  first  recursive  call,  the  outputs  of  the  previous  stage  are  used  to 
calculate  the  inputs  to  the  second  call.  In  the  first  step  below,  the  vectors  xx  and  yx 
are  calculated: 

[xi>yi]  = f^/2[ui.vi]- 

Since  ux  and  vx  are  the  Fourier  transforms  of  real  vectors,  a real  split-radix  IFFT 
may  be  used. 

In  the  original  implementation  [45]  the  output  vectors  were  x0  and  y0,  but 
since  the  two  sets  of  vectors  are  never  needed  simultaneously,  the  same  array  may  be 
shared  with  the  x1  and  yx  needed  later. 


The  second  step  calculates 


[p.  q]  = Fn 


xi  yi 

o o 


This  appears  to  require  two  length  n real-valued  FFTs,  but  since  the  length  n/2 
Fourier  coefficients  are  already  known,  Ammar  and  Gragg  recognized  that  each  of 
the  two  lenght  n FFTs  could  be  calculated  using  only  one  additional  complex  FFT 
of  length  n/4.  This  calculation  is  performed  by  subroutine  zpcal. 

( Compute  p and  q.  10 ) = 

do  i = 1,  m 

xl  (i)  = ul  (m  + i ) 
yl  ( i ) = vl  (m  + i) 

enddo 

call  irfft  (xl  ,m) 
call  irfft  (yl  ,m) 

call  zpcal  (xl  ,ul  (mpl  ),p(npl  ),n) 
call  zpcal  (yl  ,vl  (mpl  ),q(npl  ),n) 

This  code  is  used  in  section  9. 


11.  The  values  of  r and  s are  given  by 

JmXl  J™yi 
0 0 

These  values  may  be  calculated  by  simply  rearranging  p and  q,  so  no  additional 
operations  are  required.  The  rearrangement  is  performed  by  subroutine  flip. 

( Compute  r and  s.  11 ) = 

call  flip(p(npl),r(npl),n) 


[r.  s]  = F„En 
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call  flip(q(npl),s(npl),n) 

This  code  is  used  in  section  9. 


12.  The  vectors  c and  d are  just  the  Fourier  transforms  of  the  input  vectors  a and 
b,  adjusted  to  take  account  of  the  fact  that  the  sign  of  a has  been  reversed. 


Ml  = Fn[— a,b] 


( Compute  c and  d.  12 ) = 

do  i = 1,  n 
c(z)  = — a(z) 
d(i ) = b{i ) 
enddo 
call  rfft(c,n ) 
call  rfft  (d,n) 

This  code  is  used  in  section  9. 


13.  The  vectors  axand  bx  are  the  inputs  to  the  second  recursive  call  of  gsa. 


= Fn:[d  p-c  q] 
= F~J[d  • s — c • r] 


(Compute  ax  and  bx.  13)  = 

call  fftmul(c,q(npl),tl  ,n ) 
call  ffimul(d,p(npl  ),t2  ,n) 
do  i = 1 , n 

tl  ( i ) = t2(i)  — tl  (z) 

enddo 

call  irfft(tl  ,n) 
doi  = m + l,n 
al  ( i ) = tl  ( i ) 

enddo 

call  fftmul(d,s(npl  ),tl  ,n) 
call  jftmul(c,r(npl  ),t2  ,n) 
do  i = 1,  n 

tl  ( i ) — tl  ( i ) — t2  (z) 

enddo 
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call  irfft(tl  ,n) 
do  ? = m + 1,  n 
bl  (?)  = tl  (?) 

enddo 

This  code  is  used  in  section  9. 


14.  The  computation  of  px  and  qx  takes  advantage  of  the  same  “trick”  used  to 
compute  p and  q. 


[Pl.Ql]  =Fn1 


xi  yi 

o o 


(Compute  p:  and  qr  14)  = 

do  ? = 1,  m 

xl  (?)  — ul  (m  + i ) 
yl  (?)  = vl  (m  + ?) 

enddo 

call  irfft(xl  ,m) 
call  irfft(yl  ,m) 

call  zpcal(xl  ,ul  ( mpl  ),pl  ( npl  ),n) 
call  zpcal(yl  ,vl  ( mpl  ),ql  ( npl  ),n) 

This  code  is  used  in  section  9. 


15.  Finally,  the  Fourier  transforms  of  the  results  are  computed. 

u = [s  • pj  + p • qj 
v = [r  • pj  + q • qx] 


(Compute  u and  v.  is)  = 

call  fftmul(s(npl  ),pl  ( npl  ),tl  ,n) 
call  fftmul(p(npl  ),ql  ( npl  ),t2,n) 
do  ? = 1,  n 

u(i ) = tl  (?)  + t2  (?) 

enddo 

call  fftmul(r(npl  ),pl  ( npl  ),tl  ,n) 
call  fftmul(q(npl  ),ql  ( npl  ),t2,n ) 
do  ? = 1,  n 

v(i)  = tl  (?)  + t2  (?) 

enddo 


This  code  is  used  in  section  9. 


202 


16.  The  normal  Schur  algorithm  is  used  for  n < NSPLIT. 

subroutine  schur(a,b,u,v,e,n ) 

integer  n 

FLOAT a(n),b(n),  n(0  : n — 1),  v(0  : n — 1),  e(n  + 1) 

( allocate  schur' s local  variables  and  initialize  17 ) 
do  k — 1,  n 

( calculate  alphak  and  betak  is ) 

( calculate  uk  and  vk  19 ) 

enddo 

e(n  + 1)  = be{  0) 
call  rfft(u,n ) 
call  rfft(v,n) 
return 
end 


17.  (allocate  schur' s local  variables  and  initialize  17)  = 
integer  k,  j,  kmj 
FLOAT  xk  (NSPLIT) 

FLOAT o/(0  : NSPLIT  - 1),  6e(0  : NSPLIT  - 1) 
FLOAT tO  (0  : NSPLIT  - 1),  tl  (0  : NSPLIT  - 1) 
do  j — 1,  n — 1 
u(j)  = ZERO 
v(j)  = ZERO 
enddo 

u(  0)  = ZERO 
v(0)  = ONE 
xk(  1)  = ZERO 

This  code  is  used  in  section  16. 


18.  (calculate  alpha k and  betak  is)  = 

al{k  — 1)  = —a(k) 
be(k  — 1)  = b(k) 
do  j = 1,  k — 1 
kmj  = k — j 

al(kmj  — 1)  = al(kmj)  — xk  (j )*be( km] ) 
be(kmj)  = be(kmj)  — xk  (j  )*al(  kmj ) 

enddo 

xk(k ) = al(0)/be(0) 
e(k)  — be{  0) 

6e(0)  = be(0)*(ONE  - xk(k))*(ONE  + xk{k)) 


This  code  is  used  in  section  16. 
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19.  (calculate  uk  and  vk  19)  = 

do  j = 0,  k — 1 
tO(j)  = u(j) 
tl  ( j ) = v(j) 

enddo 

do  j = 0,  k — 1 
kmj  = k — j — 1 

= xk(k)*tl  (kmj)  + tO(j ) 
t)(j)  = xk(k)*tO(kmj)  + tl  (j) 

enddo 


This  code  is  used  in  section  16. 
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